Intel 253666-024US Manuel D’Utilisation

3-524 Vol. 2A

LDDQU—Load Unaligned Integer 128 Bits

INSTRUCTION SET REFERENCE, A-M

LDDQU—Load Unaligned Integer 128 Bits

Description

The instruction is functionally similar to MOVDQU xmm, m128 for loading from
memory. That is: 16 bytes of data starting at an address specified by the source
memory operand (second operand) are fetched from memory and placed in a desti-
nation register (first operand). The source operand need not be aligned on a 16-byte
boundary. Up to 32 bytes may be loaded from memory; this is implementation
dependent.
This instruction may improve performance relative to MOVDQU if the source operand
crosses a cache line boundary. In situations that require the data loaded by LDDQU
be modified and stored to the same location, use MOVDQU or MOVDQA instead of
LDDQU. To move a double quadword to or from memory locations that are known to
be aligned on 16-byte boundaries, use the MOVDQA instruction.

Implementation Notes

•

If the source is aligned to a 16-byte boundary, based on the implementation, the
16 bytes may be loaded more than once. For that reason, the usage of LDDQU
should be avoided when using uncached or write-combining (WC) memory
regions. For uncached or WC memory regions, keep using MOVDQU.

•

This instruction is a replacement for MOVDQU (load) in situations where cache
line splits significantly affect performance. It should not be used in situations
where store-load forwarding is performance critical. If performance of store-load
forwarding is critical to the application, use MOVDQA store-load pairs when data
is 128-bit aligned or MOVDQU store-load pairs when data is 128-bit unaligned.

•

If the memory address is not aligned on 16-byte boundary, some implementa-
tions may load up to 32 bytes and return 16 bytes in the destination. Some
processor implementations may issue multiple loads to access the appropriate 16
bytes. Developers of multi-threaded or multi-processor software should be aware
that on these processors the loads will be performed in a non-atomic way.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional
registers (XMM8-XMM15).

Operation

xmm[127:0] = m128;

Opcode

Instruction

64-Bit

Mode

Compat/

Leg Mode

Description

F2 0F F0 /r

LDDQU xmm1, mem

Valid

Load unaligned data from mem

and return double quadword in

xmm1.