AMD 250 Manuale Utente

Pagina di 384
124
Cache and Memory Optimizations
Chapter 5
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
5.16
Interleave Loads and Stores
When loading and storing data as in a copy routine, the organization of the sequence of loads and 
stores can affect performance.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
When using SSE and SSE2 instructions to perform loads and stores, it is best to interleave them in the 
following pattern—Load, Store, Load, Store, Load, Store, etc. This enables the processor to maxi-
mize the load/store bandwidth.
If using MMX loads and stores  in 32-bit mode, the loads and stores should be arranged in the 
following pattern—Load, Load, Store, Store, Load, Load, Store, Store, etc.
Example
The following example illustrates a sequence of 128-bit loads and stores:
movdqa     xmm0,[rdx+r8*8]           ; Load
movntdq    [rcx+r8*8],xmm0           ; Store
movdqa      xmm1,[rdx+r8*8+16]       ; Load
movntdq    [rcx+r8*8+16],xmm1        ; Store