AMD 250 Manuale Utente

Pagina di 384
92
Cache and Memory Optimizations
Chapter 5
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
5.1
Memory-Size Mismatches
Optimization
Avoid memory-size mismatches when different instructions operate on the same data. When one 
instruction stores and another instruction subsequently loads the same data, keep their operands 
aligned and keep the loads/stores of each operand the same size. 
Application
This optimization applies to:
32-bit software
64-bit software
Examples—Store-to-Load-Forwarding Stalls
The following code examples result in a store-to-load-forwarding stall:
64-bit (Avoid)
foo DQ ?                   ; Assume foo is 8-byte aligned.
...
mov DWORD PTR foo, eax     ; Store a DWORD to foo.
mov DWORD PTR foo+4, ebx   ; Now store to foo+4.
mov rcx, QWORD PTR foo     ; Load a QWORD from foo.
32-bit (Avoid)
foo DQ ?                   ; Assume foo is 4-byte aligned.
...
mov DWORD PTR foo, eax     ; Store a DWORD in foo.
mov DWORD PTR foo+4, edx   ; Store a DWORD in foo+4.
fld QWORD PTR foo          ; Load a QWORD from foo.
Avoid
mov  foo, eax
mov  foo+4, edx
...
movq mm0, foo
Preferred
mov       foo, eax
mov       foo+4, edx
...
movd      mm0, foo
punpckldq mm0, foo+4