AMD Typewriter x86 사용자 설명서
34
Select DirectPath Over VectorPath Instructions
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Select DirectPath Over VectorPath Instructions
U s e D i re c t Pa t h i n s t r u c t i o n s ra t h e r t h a n Ve c t o r Pa t h
instructions. DirectPath instructions are optimized for decode
and execute efficiently by minimizing the number of operations
per x86 instruction, which includes ‘register
instructions. DirectPath instructions are optimized for decode
and execute efficiently by minimizing the number of operations
per x86 instruction, which includes ‘register
←
register op
memory’ as well as ‘register
←
register op register’ forms of
instructions. Up to three DirectPath instructions can be
decoded per cycle. VectorPath instructions will block the
decoding of DirectPath instructions.
decoded per cycle. VectorPath instructions will block the
decoding of DirectPath instructions.
The very high majority of instructions used be a compiler has
b e e n i m p l e m e n t e d a s D i re c t Pa t h i n s t r u c t i o n s i n t h e
AMD Athlon processor. Assembly writers must still take into
consideration the usage of DirectPath versus VectorPath
instructions.
b e e n i m p l e m e n t e d a s D i re c t Pa t h i n s t r u c t i o n s i n t h e
AMD Athlon processor. Assembly writers must still take into
consideration the usage of DirectPath versus VectorPath
instructions.
See Appendix F, “Instruction Dispatch and Execution
Resources” on page 187 and Appendix G, “DirectPath versus
VectorPath Instructions” on page 219 for tables of DirectPath
and VectorPath instructions.
Resources” on page 187 and Appendix G, “DirectPath versus
VectorPath Instructions” on page 219 for tables of DirectPath
and VectorPath instructions.
Load-Execute Instruction Usage
Use Load-Execute Integer Instructions
Most load-exe cute integ er ins tructions a re Dire ctPath
decodable and can be decoded at the rate of three per cycle.
Splitting a load-execute integer instruction into two separate
instructions—a load instruction and a “reg, reg” instruction —
decodable and can be decoded at the rate of three per cycle.
Splitting a load-execute integer instruction into two separate
instructions—a load instruction and a “reg, reg” instruction —
reduces decoding bandwidth and increases register pressure,
which results in lower performance. The split-instruction form
can be used to avoid scheduler stalls for longer executing
instructions and to explicitly schedule the load and execute
operations.
can be used to avoid scheduler stalls for longer executing
instructions and to explicitly schedule the load and execute
operations.
✩
TOP
✩
TOP