Intel architecture ia-32 User Manual

Page of 636
Vol. 3A 10-9
MEMORY CACHE CONTROL
The only elements of WC propagation to the system bus that are guaranteed are those provided
by transaction atomicity. For example, with a P6 family processor, a completely full WC buffer
will always be propagated as a single 32-bit burst transaction using any chunk order. In a WC
buffer eviction where the data will be evicted as partials, all data contained in the same chunk
(0 mod 8 aligned) will be propagated simultaneously. Likewise, with a Pentium 4 or Intel Xeon
processor, a full WC buffer will always be propagated as a single burst transactions, using any
chunk order within a transaction. For partial buffer propagations, all data contained in the same
chunk will be propagated simultaneously.
10.3.2
Choosing a Memory Type
The simplest system memory model does not use memory-mapped I/O with read or write side
effects, does not include a frame buffer, and uses the write-back memory type for all memory.
An I/O agent can perform direct memory access (DMA) to write-back memory and the cache
protocol maintains cache coherency.
A system can use strong uncacheable memory for other memory-mapped I/O, and should
always use strong uncacheable memory for memory-mapped I/O with read side effects.
Dual-ported memory can be considered a write side effect, making relatively prompt writes
desirable, because those writes cannot be observed at the other port until they reach the memory
agent. A system can use strong uncacheable, uncacheable, write-through, or write-combining
memory for frame buffers or dual-ported memory that contains pixel values displayed on a
screen. Frame buffer memory is typically large (a few megabytes) and is usually written more
than it is read by the processor. Using strong uncacheable memory for a frame buffer generates
very large amounts of bus traffic, because operations on the entire buffer are implemented using
partial writes rather than line writes. Using write-through memory for a frame buffer can
displace almost all other useful cached lines in the processor's L2 and L3 caches and L1 data
cache. Therefore, systems should use write-combining memory for frame buffers whenever
possible.
Software can use page-level cache control, to assign appropriate effective memory types when
software will not access data structures in ways that benefit from write-back caching. For
example, software may read a large data structure once and not access the structure again until
the structure is rewritten by another agent. Such a large data structure should be marked as
uncacheable, or reading it will evict cached lines that the processor will be referencing again. 
A similar example would be a write-only data structure that is written to (to export the data to
another agent), but never read by software. Such a structure can be marked as uncacheable,
because software never reads the values that it writes (though as uncacheable memory, it will be
written using partial writes, while as write-back memory, it will be written using line writes,
which may not occur until the other agent reads the structure and triggers implicit write-backs).
On the Pentium III, Pentium 4, and Intel Xeon processors, new instructions are provided that
give software greater control over the caching, prefetching, and the write-back characteristics of
data. These instructions allow software to use weakly ordered or processor ordered memory
types to improve processor performance, but when necessary to force strong ordering on
memory reads and/or writes. They also allow software greater control over the caching of data.