Справочник Пользователя для AMD 250

Скачать
Страница из 384
Appendix C
Instruction Latencies
317
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
C.7
SSE Instructions
Table 18.
SSE Instructions
Syntax
Encoding
Decode 
type
FPU pipe(s)
Latency Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
ADDPS xmmreg1, 
xmmreg2
0Fh
58h
11-xxx-xxx
Double
FADD
5
ADDPS xmmreg, 
mem128
0Fh
58h
mm-xxx-xxx
Double
FADD
7
ADDSS xmmreg1, 
xmmreg2
F3h
0Fh
58h
11-xxx-xxx
DirectPath
FADD
ADDSS xmmreg, 
mem128
F3h
0Fh
58h
mm-xxx-xxx
DirectPath
FADD
6
ANDNPS xmmreg1, 
xmmreg2
0Fh
55h
11-xxx-xxx
Double
FMUL
3 1
ANDNPS xmmreg, 
mem128
0Fh
55h
mm-xxx-xxx
Double
FMUL
5
ANDPS xmmreg1, 
xmmreg2
0Fh
54h
11-xxx-xxx
Double
FMUL
3 1
ANDPS xmmreg, 
mem128
0Fh
54h
mm-xxx-xxx
Double
FMUL
5
CMPPS xmmreg1, 
xmmreg2, imm8
0Fh
C2h
11-xxx-xxx
Double
FADD
3 1
CMPPS xmmreg, 
mem128, imm8
0Fh
C2h
mm-xxx-xxx
Double
FADD
5
CMPSS xmmreg1, 
xmmreg2, imm8
F3h
0Fh
C2h
11-xxx-xxx
DirectPath
FADD
CMPSS xmmreg, 
mem32, imm8
F3h
0Fh
C2h
mm-xxx-xxx
DirectPath
FADD
4
COMISS xmmreg1, 
xmmreg2
0Fh
2Fh
11-xxx-xxx
VectorPath
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. The second latency value indicates when the low half of the result becomes available.
3. The high half of the result is available one cycle earlier than listed.
4. The latency listed is the absolute minimum, while average latencies may be higher and are a function of internal 
pipeline conditions.
5. For the PREFETCHNTA/T0/T1/T2 instructions, the mem8 value refers to an address in the 64-byte line to be 
prefetched.
6. The 8-clock latency is only visible to younger stores that need to do an external write. The 2-clock latency is 
visible to the other stores and instructions.
7. This is the execution latency for the instruction. The time to complete the external write depends on the memory 
speed and the hardware implementation.