Intel 253668-032US Manuale Utente

Pagina di 806
Vol. 3   8-23
MULTIPLE-PROCESSOR MANAGEMENT
as the XCHG instruction or the LOCK prefix to insure that a read-modify-write opera-
tion on memory is carried out atomically. Locking operations typically operate like 
I/O operations in that they wait for all previous instructions to complete and for all 
buffered writes to drain to memory (see Section 8.1.2, “Bus Locking”).
Program synchronization can also be carried out with serializing instructions (see 
Section 8.3). These instructions are typically used at critical procedure or task 
boundaries to force completion of all previous instructions before a jump to a new 
section of code or a context switch occurs. Like the I/O and locking instructions, the 
processor waits until all previous instructions have been completed and all buffered 
writes have been drained to memory before executing the serializing instruction.
The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way 
of insuring load and store memory ordering between routines that produce weakly-
ordered results and routines that consume that data. The functions of these instruc-
tions are as follows:
SFENCE — Serializes all store (write) operations that occurred prior to the 
SFENCE instruction in the program instruction stream, but does not affect load 
operations.
LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE 
instruction in the program instruction stream, but does not affect store 
operations.
MFENCE — Serializes all store and load operations that occurred prior to the 
MFENCE instruction in the program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient 
method of controlling memory ordering than the CPUID instruction.
The MTRRs were introduced in the P6 family processors to define the cache charac-
teristics for specified areas of physical memory. The following are two examples of 
how memory types set up with MTRRs can be used strengthen or weaken memory 
ordering for the Pentium 4, Intel Xeon, and P6 family processors:
The strong uncached (UC) memory type forces a strong-ordering model on 
memory accesses. Here, all reads and writes to the UC memory region appear on 
the bus and out-of-order or speculative accesses are not performed. This 
memory type can be applied to an address range dedicated to memory mapped 
I/O devices to force strong memory ordering.
For areas of memory where weak ordering is acceptable, the write back (WB) 
memory type can be chosen. Here, reads can be performed speculatively and 
writes can be buffered and combined. For this type of memory, cache locking is 
performed on atomic (locked) operations that do not split across cache lines, 
which helps to reduce the performance penalty associated with the use of the 
typical synchronization instructions, such as XCHG, that lock the bus during the 
entire read-modify-write operation. With the WB memory type, the XCHG 
instruction locks the cache instead of the bus if the memory access is contained 
within a cache line.