Справочник Пользователя для AMD 250

Скачать
Страница из 384
Appendix C
Instruction Latencies
307
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
C.4
x87 Floating-Point Instructions
Table 15.
x87 Floating-Point Instructions
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
F2XM1
D9h
11-110-000
VectorPath -
65
FABS
D9h
11-100-001
DirectPath
FMUL
2
FADD ST, ST(i)
D8h
11-000-xxx
DirectPath
FADD
4
FADD [mem32real]
D8h
mm-000-xxx
DirectPath
FADD
6
FADD ST(i), ST
DCh
11-000-xxx
DirectPath
FADD
4
FADD [mem64real]
DCh
mm-000-xxx
DirectPath
FADD
6
FADDP ST(i), ST
DEh
11-000-xxx
DirectPath
FADD
4
FBLD [mem80]
DFh
mm-100-xxx
VectorPath -
87
FBSTP [mem80]
DFh
mm-110-xxx
VectorPath -
172
FCHS
D9h
11-100-000
DirectPath
FMUL
2
FCLEX
DBh
 E2h
11-100-010
VectorPath -
~
FCMOVB ST(0), ST(i)
DAh
11-000-xxx
VectorPath -
15
FCMOVBE ST(0), ST(i)
DAh
11-010-xxx
VectorPath -
15
FCMOVE ST(0), ST(i)
DAh
11-001-xxx
VectorPath -
15
FCMOVNB ST(0), ST(i)
DBh
11-000-xxx
VectorPath -
15
FCMOVNBE ST(0), ST(i)
DBh
11-010-xxx
VectorPath -
15
FCMOVNE ST(0), ST(i)
DBh
11-001-xxx
VectorPath -
15
FCMOVNU ST(0), ST(i)
DBh
11-011-xxx
VectorPath -
15
FCMOVU ST(0), ST(i)
DAh
11-011-xxx
VectorPath -
15
FCOM ST(i)
D8h
11-010-xxx
DirectPath
FADD
2
FCOM [mem32real]
D8h
mm-010-xxx
DirectPath
FADD
4
FCOM [mem64real]
DCh
mm-010-xxx
DirectPath
FADD
4
FCOMI ST, ST(i)
DBh
11-110-xxx
VectorPath FADD
3
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP 
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of 
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. “e” represents the difference between the exponents 
of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then 
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double 
precision, and extended precision, respectively.