Справочник Пользователя для AMD 250
Chapter 9
Optimizing with SIMD Instructions
195
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
9.1
Ensure All Packed Floating-Point Data are Aligned
Optimization
Align all packed floating-point data on 16-byte boundaries.
Application
This optimization applies to:
•
32-bit software
•
64-bit software
Rationale
Misaligned memory accesses reduce the available memory bandwidth and SSE and SSE2 instructions
have shorter latencies when operating on aligned memory operands.
have shorter latencies when operating on aligned memory operands.
Aligning data on 16-byte boundaries allows you to use the aligned load instructions (MOVAPS,
MOVAPD, and MOVDQA), which move through the floating-point unit with shorter latencies and
reduce the possibility of stalling addition or multiplication instructions that are dependent on the load
data.
MOVAPD, and MOVDQA), which move through the floating-point unit with shorter latencies and
reduce the possibility of stalling addition or multiplication instructions that are dependent on the load
data.