AMD Typewriter x86 사용자 설명서
36
Align Branch Targets in Program Hot Spots
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Example 1 (Avoid):
FLD
QWORD PTR [foo]
FIMUL
DWORD PTR [bar]
FIADD
DWORD PTR [baz]
Example 2 (Preferred):
FILD
DWORD PTR [bar]
FILD
DWORD PTR [baz]
FLD
QWORD PTR [foo]
FMULP
ST(2), ST
FADDP
ST(1),ST
Align Branch Targets in Program Hot Spots
In program hot spots (i.e., innermost loops in the absence of
profiling data), place branch targets at or near the beginning of
16-byte aligned code windows. This technique helps to
maximize the number of instructions that are filled into the
instruction-byte queue while preventing I-cache space in
branch intensive code.
profiling data), place branch targets at or near the beginning of
16-byte aligned code windows. This technique helps to
maximize the number of instructions that are filled into the
instruction-byte queue while preventing I-cache space in
branch intensive code.
Use Short Instruction Lengths
Assemblers and compilers should generate the tightest code
possible to optimize use of the I-cache and increase average
decode rate. Wherever possible, use instructions with shorter
lengths. Using shorter instructions increases the number of
instructions that can fit into the instruction-byte queue. For
ex am pl e, us e 8 -b it displ ace m en ts as opp os ed to 32 -bit
displacements. In addition, use the single-byte format of simple
integer instructions whenever possible, as opposed to the
2-byte opcode ModR/M format.
possible to optimize use of the I-cache and increase average
decode rate. Wherever possible, use instructions with shorter
lengths. Using shorter instructions increases the number of
instructions that can fit into the instruction-byte queue. For
ex am pl e, us e 8 -b it displ ace m en ts as opp os ed to 32 -bit
displacements. In addition, use the single-byte format of simple
integer instructions whenever possible, as opposed to the
2-byte opcode ModR/M format.
Example 1 (Avoid):
81 C0 78 56 34 12 add eax, 12345678h ;uses 2-byte opcode
; form (with ModR/M)
81 C3 FB FF FF FF
add ebx, -5
;uses 32-bit
; immediate
; immediate
0F 84 05 00 00 00
jz $label1
;uses 2-byte opcode,
; 32-bit immediate
; 32-bit immediate