AMD 250 User Manual

Page of 384
Chapter 9
Optimizing with SIMD Instructions
197
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
The statement 
movlps xmm1,
mem64
 marks the lower half of XMM1 as FPS (floating-point 
single-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any 
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half 
is not also in FPS format. Examples of instructions that expect the full 128 bits of XMM1 to be in 
FPS format are MOVAPS, ANDPS, ANDNPS, and ORPS. For more information on XMM-
register data types, see “Half-Register Operations” on page 356.
Rational—Double Precision
The MOVLPD instruction does not necessitate clearing the upper 64 bits of an XMM register, as the 
MOVSD/MOVQ instructions do, upon loading 64 bits of floating-point data into the lower 64 bits of 
the XMM register. Using the MOVLPD instruction can significantly increase performance on 
processor-limited SSE2 scalar floating-point-intensive code.
Consider the following caveat when using the MOVLPD instruction:
The statement 
movlpd xmm1,
mem64
 marks the lower half of XMM1 as FPD (floating-point 
double-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any 
instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half 
is not also in FPD format. Examples of instructions that expect the full 128 bits of XMM1 to be in 
FPD format are ANDPD, ANDNPD, and ORPD. For more information on XMM-register data 
types, see “Half-Register Operations” on page 356.