AMD 250 User Manual

Chapter 9

Optimizing with SIMD Instructions

197

Software Optimization Guide for AMD64 Processors

25112

Rev. 3.06

September 2005

•

The statement

movlps xmm1,

mem64

marks the lower half of XMM1 as FPS (floating-point

single-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half
is not also in FPS format. Examples of instructions that expect the full 128 bits of XMM1 to be in
FPS format are MOVAPS, ANDPS, ANDNPS, and ORPS. For more information on XMM-
register data types, see “Half-Register Operations” on page 356.

Rational—Double Precision

The MOVLPD instruction does not necessitate clearing the upper 64 bits of an XMM register, as the
MOVSD/MOVQ instructions do, upon loading 64 bits of floating-point data into the lower 64 bits of
the XMM register. Using the MOVLPD instruction can significantly increase performance on
processor-limited SSE2 scalar floating-point-intensive code.

Consider the following caveat when using the MOVLPD instruction:

•

The statement

movlpd xmm1,

mem64

marks the lower half of XMM1 as FPD (floating-point

double-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half
is not also in FPD format. Examples of instructions that expect the full 128 bits of XMM1 to be in
FPD format are ANDPD, ANDNPD, and ORPD. For more information on XMM-register data
types, see “Half-Register Operations” on page 356.