Справочник Пользователя для AMD 250

Appendix C

Instruction Latencies

317

Software Optimization Guide for AMD64 Processors

25112

Rev. 3.06

September 2005

C.7

SSE Instructions

Table 18.

SSE Instructions

Syntax

Encoding

Decode
type

FPU pipe(s)

Latency Note

Prefix
byte

First
byte

2nd
byte

ModRM byte

ADDPS xmmreg1,
xmmreg2

0Fh

58h

11-xxx-xxx

Double

FADD

ADDPS xmmreg,
mem128

0Fh

58h

mm-xxx-xxx

Double

FADD

ADDSS xmmreg1,
xmmreg2

F3h

0Fh

58h

11-xxx-xxx

DirectPath

FADD

ADDSS xmmreg,
mem128

F3h

0Fh

58h

mm-xxx-xxx

DirectPath

FADD

ANDNPS xmmreg1,
xmmreg2

0Fh

55h

11-xxx-xxx

Double

FMUL

3 1

ANDNPS xmmreg,
mem128

0Fh

55h

mm-xxx-xxx

Double

FMUL

ANDPS xmmreg1,
xmmreg2

0Fh

54h

11-xxx-xxx

Double

FMUL

3 1

ANDPS xmmreg,
mem128

0Fh

54h

mm-xxx-xxx

Double

FMUL

CMPPS xmmreg1,
xmmreg2, imm8

0Fh

C2h

11-xxx-xxx

Double

FADD

3 1

CMPPS xmmreg,
mem128, imm8

0Fh

C2h

mm-xxx-xxx

Double

FADD

CMPSS xmmreg1,
xmmreg2, imm8

F3h

0Fh

C2h

11-xxx-xxx

DirectPath

FADD

CMPSS xmmreg,
mem32, imm8

F3h

0Fh

C2h

mm-xxx-xxx

DirectPath

FADD

COMISS xmmreg1,
xmmreg2

0Fh

2Fh

11-xxx-xxx

VectorPath

Notes:

1. The low half of the result is available one cycle earlier than listed.
2. The second latency value indicates when the low half of the result becomes available.
3. The high half of the result is available one cycle earlier than listed.
4. The latency listed is the absolute minimum, while average latencies may be higher and are a function of internal

pipeline conditions.

5. For the PREFETCHNTA/T0/T1/T2 instructions, the mem8 value refers to an address in the 64-byte line to be

prefetched.

6. The 8-clock latency is only visible to younger stores that need to do an external write. The 2-clock latency is

visible to the other stores and instructions.

7. This is the execution latency for the instruction. The time to complete the external write depends on the memory

speed and the hardware implementation.