AMD Typewriter x86 사용자 설명서
26
Explicitly Extract Common Subexpressions
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
lead to unexpected results. Fortunately, in the vast majority of
cases, the final result will differ only in the least significant
bits.
cases, the final result will differ only in the least significant
bits.
Example 1 (Avoid):
double a[100],sum;
int i;
int i;
sum = 0.0f;
for (i=0; i<100; i++) {
sum += a[i];
}
Example 2 (Preferred):
double a[100],sum1,sum2,sum3,sum4,sum;
int i;
int i;
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
sum4 = 0.0;
for (i=0; i<100; i+4) {
sum2 = 0.0;
sum3 = 0.0;
sum4 = 0.0;
for (i=0; i<100; i+4) {
sum1 += a[i];
sum2 += a[i+1];
sum3 += a[i+2];
sum4 += a[i+3];
}
sum2 += a[i+1];
sum3 += a[i+2];
sum4 += a[i+3];
}
sum = (sum4+sum3)+(sum1+sum2);
Notice that the 4-way unrolling was chosen to exploit the 4-stage
fully pipelined floating-point adder. Each stage of the floating-
point adder is occupied on every clock cycle, ensuring maximal
sustained utilization.
fully pipelined floating-point adder. Each stage of the floating-
point adder is occupied on every clock cycle, ensuring maximal
sustained utilization.
Explicitly Extract Common Subexpressions
In certain situations, C compilers are unable to extract common
subexpressions from floating-point expressions due to the
guarantee against reordering of such expressions in the ANSI
standard. Specifically, the compiler can not re-arrange the
computation according to algebraic equivalencies before
ex tracting com mo n subexpre ss ions . I n such ca se s, the
p r o g r a m m e r s h o u l d m a n u a l l y e x t r a c t t h e c o m m o n
subexpression. It should be noted that re-arranging the
expression may result in different computational results due to
the lack of associativity of floating-point operations, but the
results usually differ in only the least significant bits.
subexpressions from floating-point expressions due to the
guarantee against reordering of such expressions in the ANSI
standard. Specifically, the compiler can not re-arrange the
computation according to algebraic equivalencies before
ex tracting com mo n subexpre ss ions . I n such ca se s, the
p r o g r a m m e r s h o u l d m a n u a l l y e x t r a c t t h e c o m m o n
subexpression. It should be noted that re-arranging the
expression may result in different computational results due to
the lack of associativity of floating-point operations, but the
results usually differ in only the least significant bits.