Справочник Пользователя для AMD 250

Скачать
Страница из 384
Chapter 4
Instruction-Decoding Optimizations
85
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
4.9
Alternatives to SHLD Instruction
Optimization
Where register pressure is low, replace the SHLD instruction with alternative code using ADD and 
ADC, or SHR and LEA.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution 
resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA are DirectPath instructions, 
while SHLD is a VectorPath instruction. Use of the replacement code optimizes decode bandwidth 
because it potentially enables the simultaneous decoding of a third DirectPath instruction. However, 
the replacement code may increase register pressure because it destroys the contents of one register 
(reg2 in the following examples) whereas the register is preserved by SHLD.
Example 1
Replace this instruction:
shld 
reg1, reg2, 1
with this code sequence:
add 
reg2, reg2
adc 
reg1, reg1
Example 2
Replace this instruction:
shld 
reg1, reg2, 2
with this code sequence:
shr 
reg2, 30
lea 
reg1, [reg1*4+reg2]