Справочник Пользователя для AMD 250

Скачать
Страница из 384
20
C and C++ Source-Level Optimizations
Chapter 2
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
2.8
Unnecessary Store-to-Load Dependencies
A store-to-load dependency exists when data is stored to memory, only to be read back shortly 
thereafter. For details, see “Store-to-Load Forwarding Restrictions” on page 100. The 
AMD Athlon™ 64 and AMD Opteron™ processors contain hardware to accelerate such store-to-load 
dependencies, allowing the load to obtain the store data before it has been written to memory. 
However, it is still faster to avoid such dependencies altogether and keep the data in an internal 
register. 
Avoiding store-to-load dependencies is especially important if they are part of a long dependency 
chain, as may occur in a recurrence computation. If the dependency occurs while operating on arrays, 
many compilers are unable to optimize the code in a way that avoids the store-to-load dependency. In 
some instances the language definition may prohibit the compiler from using code transformations 
that would remove the store-to-load dependency. Therefore, it is recommended that the programmer 
remove the dependency manually, for example, by introducing a temporary variable that can be kept 
in a register, as in the following example. This can result in a significant performance increase.
Listing 3. Avoid
double x[VECLEN], y[VECLEN], z[VECLEN];
unsigned int k;
for (k = 1; k < VECLEN; k++) {
   x[k] = x[k-1] + y[k];
}
for (k = 1; k < VECLEN; k++) {
   x[k] = z[k] * (y[k] - x[k-1]);
}
Listing 4. Preferred
double x[VECLEN], y[VECLEN], z[VECLEN];
unsigned int k;
double t;
t = x[0];
for (k = 1; k < VECLEN; k++) {
   t = t + y[k];
   x[k] = t;
}
t = x[0];
for (k = 1; k < VECLEN; k++) {
   t = z[k] * (y[k] - t);
   x[k] = t;
}