Справочник Пользователя для AMD 250

Appendix C

Instruction Latencies

307

Software Optimization Guide for AMD64 Processors

25112

Rev. 3.06

September 2005

C.4

x87 Floating-Point Instructions

Table 15.

x87 Floating-Point Instructions

Syntax

Encoding

Decode
type

FPU
pipe(s)

Latency Note

First
byte

Second
byte

ModRM byte

F2XM1

D9h

11-110-000

VectorPath -

FABS

D9h

11-100-001

DirectPath

FMUL

FADD ST, ST(i)

D8h

11-000-xxx

DirectPath

FADD

FADD [mem32real]

D8h

mm-000-xxx

DirectPath

FADD

FADD ST(i), ST

DCh

11-000-xxx

DirectPath

FADD

FADD [mem64real]

DCh

mm-000-xxx

DirectPath

FADD

FADDP ST(i), ST

DEh

11-000-xxx

DirectPath

FADD

FBLD [mem80]

DFh

mm-100-xxx

VectorPath -

FBSTP [mem80]

DFh

mm-110-xxx

VectorPath -

172

FCHS

D9h

11-100-000

DirectPath

FMUL

FCLEX

DBh

E2h

11-100-010

VectorPath -

FCMOVB ST(0), ST(i)

DAh

11-000-xxx

VectorPath -

FCMOVBE ST(0), ST(i)

DAh

11-010-xxx

VectorPath -

FCMOVE ST(0), ST(i)

DAh

11-001-xxx

VectorPath -

FCMOVNB ST(0), ST(i)

DBh

11-000-xxx

VectorPath -

FCMOVNBE ST(0), ST(i)

DBh

11-010-xxx

VectorPath -

FCMOVNE ST(0), ST(i)

DBh

11-001-xxx

VectorPath -

FCMOVNU ST(0), ST(i)

DBh

11-011-xxx

VectorPath -

FCMOVU ST(0), ST(i)

DAh

11-011-xxx

VectorPath -

FCOM ST(i)

D8h

11-010-xxx

DirectPath

FADD

FCOM [mem32real]

D8h

mm-010-xxx

DirectPath

FADD

FCOM [mem64real]

DCh

mm-010-xxx

DirectPath

FADD

FCOMI ST, ST(i)

DBh

11-110-xxx

VectorPath FADD

Notes:

1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP

with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.

3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. “e” represents the difference between the exponents

of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).

5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double

precision, and extended precision, respectively.