AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
70
Unrolling Loops
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
n o   f a s t e r   t h a n   t h r e e   i t e ra t i o n s   i n   1 0   cy c l e s ,   o r   6 / 1 0
floating-point adds per cycle, or 1.4 times as fast as the original
loop. 
Deriving Loop 
Control For Partially 
Unrolled Loops
A frequently used loop construct is a counting loop. In a typical
case, the loop count starts at some lower bound 
lo
, increases by
some fixed, positive increment 
inc
 for each iteration of the
loop, and may not exceed some upper bound 
hi
. The following
example shows how to partially unroll such a loop by an
unrolling factor of 
fac
, and how to derive the loop control for
the partially unrolled version of the loop.
Example 1 (rolled loop):  
 for (k = lo; k <= hi; k += inc) {
    x[k] = 
    ...
 }
Example 2 (partially unrolled loop):  
 for (k = lo; k <= (hi - (fac-1)*inc); k += fac*inc) {
    x[k] =
    ...
    x[k+inc] =
    ...
    ...
    x[k+(fac-1)*inc] =
    ...
 }  
 /* handle end cases */
 for (k = k; k <= hi; k += inc) {
    x[k] =
    ...
 }