Intel architecture ia-32 User Manual

Page of 636
Vol. 3A 7-5
MULTIPLE-PROCESSOR MANAGEMENT
7.1.2.2
Software Controlled Bus Locking
To explicitly force the LOCK semantics, software can use the LOCK prefix with the following
instructions when they are used to modify a memory location. An invalid-opcode exception
(#UD) is generated when the LOCK prefix is used with any other instruction or when no write
operation is made to memory (that is, when the destination operand is in a register).
The bit test and modify instructions (BTS, BTR, and BTC).
The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B). 
The LOCK prefix is automatically assumed for XCHG instruction.
The following single-operand arithmetic and logical instructions: INC, DEC, NOT, and
NEG.
The following two-operand arithmetic and logical instructions: ADD, ADC, SUB, SBB,
AND, OR, and XOR.
A locked instruction is guaranteed to lock only the area of memory defined by the destination
operand, but may be interpreted by the system as a lock for a larger memory area.
Software should access semaphores (shared memory used for signalling between multiple
processors) using identical addresses and operand lengths. For example, if one processor
accesses a semaphore using a word access, other processors should not access the semaphore
using a byte access. Do not use semaphores on the WC memory type.
The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK
semantics are followed for as many bus cycles as necessary to update the entire operand.
However, it is recommend that locked accesses be aligned on their natural boundaries for better
system performance:
Any boundary for an 8-bit access (locked or otherwise).
16-bit boundary for locked word accesses.
32-bit boundary for locked doubleword accesses.
64-bit boundary for locked quadword accesses.
Locked operations are atomic with respect to all other memory operations and all externally
visible events. Only instruction fetch and page table accesses can pass locked instructions.
Locked instructions can be used to synchronize data written by one processor and read by
another processor.
For the P6 family processors, locked operations serialize all outstanding load and store opera-
tions (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon
processors, with one exception. Load operations that reference weakly ordered memory types
(such as the WC memory type) may not be serialized.