AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization 
Schedule Instructions According to their Latency
67
7
Scheduling Optimizations
This chapter describes how to code instructions for efficient
scheduling. Guidelines are listed in order of importance.
Schedule Instructions According to their Latency
The AMD Athlon™ processor can execute up to three x86
instructions per cycle, with each x86 instruction possibly having
a different latency. The AMD Athlon processor has flexible
scheduling, but for absolute maximum performance, schedule
instructions, especially FPU and 3DNow!™ instructions,
according to their latency. Dependent instructions will then not
have to wait on instructions with longer latencies.
Unrolling Loops
Complete Loop Unrolling
Make use of the large AMD Athlon processor 64-Kbyte
instruction cache and unroll loops to get more parallelism and
reduce loop overhead, even with branch prediction. Complete