AMD 250 User Manual

Page of 384
Chapter 9
Optimizing with SIMD Instructions
199
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
9.4
Use MOVAPD and MOVAPS Instead of MOVUPD 
and MOVUPS
Optimization
For best performance use the aligned versions of these instructions when using a memory operand.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
Both MOVUPS and MOVUPD are VectorPath instructions when one of the operands is a memory 
location.  It is better to use MOVAPS and MOVAPD since they are both DirectPath Double decode 
types. Misaligned memory accesses also reduce the available memory bandwidth and SSE and SSE2 
instructions have shorter latencies when operating on aligned memory operands. Aligning data on 16-
byte boundaries allows you to use the aligned load instructions (MOVAPS, MOVAPD, and 
MOVDQA), which move through the floating-point unit with shorter latencies and reduce the 
possibility of stalling addition or multiplication instructions that are dependent on the load data.