Your question doesn't have a straight or unique answer, that's why you might be confused. In fact, it all depends on how you're planning on using the images and their particular purpose in your study. Let's consider several facts:
1- To avoid topographic effects (parallax) in the correlation, not only the images you're going to correlate will need to be well co-registered together, but in addition, they will have to be well registered with the DEM. If you have a mis-registration between the DEM and the ortho-images, the two images you're trying to analyze won't register well. A good solution to ensure that the ortho-images and the topography are well co-registered is to optimize the GCP of the first image with the shaded topography. However, this operation can be sensitive and doesn't always work well if the DEM is not well resolved. As a rule of thumb, if the resolution difference between the 1A image and the DEM is more than a ratio of 3-4, this step won't work very well unless you have sharp topographic features that are well spread in your image. Correlation between a 15m 1A ASTER and a 90m SRTM may therefore not always work well. If the GCP optimization doesn't work (no convergence or large mean residuals), then you can "very carefully" select GCP manually, don't optimize them, and then produce the ortho ASTER this way. This corresponds to the traditional manual and tedious way for processing ortho-images.
2- Because correlation between the shaded DEM and the ASTER images may not work well (either due to the absence of sharp topo feature and/or because of the resolution difference), it is indeed possible to use a Pan Landsat image as first master image. However, Landsat images are not always accurately georeferenced. If the Landsat image you use as a master image is not well referenced with the DEM you want to use to rectify the ASTER images, optimizing the GCP with the Landsat image becomes useless as it won't achieve its primary goal: ensuring proper registration of the ASTER images with the DEM to minimize parallax artifacts. In this approach, you're using a Landsat image as a proxy for the ground truth.
3- Note that ASTER absolute metadata are not always poor (unlike most older SPOT 1-4 data). Hence, there's a possibility that even without using any GCP your image will registered well with the topography (or at least within a decent uncertainty).
4- If your goal is only to correlate ASTER images together, then I don't really think you need to use a Landsat image. You're certainly better off processing your images using GCP with a shaded DEM and try optimizing if it works. However, if your goal is to produce a time-series of images between a Landsat image and ASTER images, then I would suggest you indeed try to register the ASTER image to the Landsat image to avoid global mis-registration in the time-series.
5- In case you're using a shaded DEM to optimize your GCP, don't forget that only the master image is optimized this way. The second ortho should be optimized with respect to the first ortho to minimize registration errors in your time series.
6- If the sand dunes are moving quickly and you're only using 1 DEM made at a given date (e.g., SRTM), you will necessarily have topographic effects biasing your correlation. If 3D structures are largely moving (by several pixels), if you don't have access to 1 DEM per image and with all the DEM well registered, then you should try to select images with close to nadir acquisitions (within at most couple degrees from vertical) so that they are not sensitive to topographic changes. as a rule of thumb, most ASTER images sharing identical footprint will verify this condition.
Hope this helps,