Intel architecture ia-32 User Manual

Page of 636
10-8 Vol. 3A
MEMORY CACHE CONTROL
10.3.1
Buffering of Write Combining Memory Locations
Writes to the WC memory type are not cached in the typical sense of the word cached. They are
retained in an internal write combining buffer (WC buffer) that is separate from the internal L1,
L2, and L3 caches and the store buffer. The WC buffer is not snooped and thus does not provide
data coherency. Buffering of writes to WC memory is done to allow software a small window
of time to supply more modified data to the WC buffer while remaining as non-intrusive to soft-
ware as possible. The buffering of writes to WC memory also causes data to be collapsed; that
is, multiple writes to the same memory location will leave the last data written in the location
and the other writes will be lost.
The size and structure of the WC buffer is not architecturally defined. For the Pentium 4 and
Intel Xeon processors, the WC buffer is made up of several 64-byte WC buffers. For the P6
family processors, the WC buffer is made up of several 32-byte WC buffers. 
When software begins writing to WC memory, the processor begins filling the WC buffers one
at a time. When one or more WC buffers has been filled, the processor has the option of evicting
the buffers to system memory. The protocol for evicting the WC buffers is implementation
dependent and should not be relied on by software for system memory coherency. When using
the WC memory type, software must be sensitive to the fact that the writing of data to system
memory is being delayed and must deliberately empty the WC buffers when system memory
coherency is required.
Once the processor has started to evict data from the WC buffer into system memory, it will
make a bus-transaction style decision based on how much of the buffer contains valid data. If
the buffer is full (for example, all bytes are valid) the processor will execute a burst-write trans-
action on the bus that will result in all 32 bytes (P6 family processors) or 64 bytes (Pentium 4
and Intel Xeon processor) being transmitted on the data bus in a single burst transaction. If one
or more of the WC buffer’s bytes are invalid (for example, have not been written by software)
then the processor will transmit the data to memory using “partial write” transactions (one chunk
at a time, where a “chunk” is 8 bytes). 
This will result in a maximum of 4 partial write transactions (for P6 family processors) or 8
partial write transactions (for the Pentium 4 and Intel Xeon processors) for one WC buffer of
data sent to memory. 
The WC memory type is weakly ordered by definition. Once the eviction of a WC buffer has
started, the data is subject to the weak ordering semantics of its definition. Ordering is not main-
tained between the successive allocation/deallocation of WC buffers (for example, writes to WC
buffer 1 followed by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the
system bus). When a WC buffer is evicted to memory as partial writes there is no guaranteed
ordering between successive partial writes (for example, a partial write for chunk 2 may appear
on the bus before the partial write for chunk 1 or vice versa).