AMD Typewriter x86 사용자 설명서
70
Unrolling Loops
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
n o f a s t e r t h a n t h r e e i t e ra t i o n s i n 1 0 cy c l e s , o r 6 / 1 0
floating-point adds per cycle, or 1.4 times as fast as the original
loop.
floating-point adds per cycle, or 1.4 times as fast as the original
loop.
Deriving Loop
Control For Partially
Unrolled Loops
Control For Partially
Unrolled Loops
A frequently used loop construct is a counting loop. In a typical
case, the loop count starts at some lower bound
case, the loop count starts at some lower bound
lo
, increases by
some fixed, positive increment
inc
for each iteration of the
loop, and may not exceed some upper bound
hi
. The following
example shows how to partially unroll such a loop by an
unrolling factor of
unrolling factor of
fac
, and how to derive the loop control for
the partially unrolled version of the loop.
Example 1 (rolled loop):
for (k = lo; k <= hi; k += inc) {
x[k] =
...
}
x[k] =
...
}
Example 2 (partially unrolled loop):
for (k = lo; k <= (hi - (fac-1)*inc); k += fac*inc) {
x[k] =
...
x[k+inc] =
...
...
x[k+(fac-1)*inc] =
...
}
x[k] =
...
x[k+inc] =
...
...
x[k+(fac-1)*inc] =
...
}
/* handle end cases */
for (k = k; k <= hi; k += inc) {
x[k] =
...
}
x[k] =
...
}