AMD 250 Manuale Utente

Pagina di 384
352
AGP Considerations
Appendix D
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
Figure 12. Northbridge Command Flow
D.5
Memory Optimizations for Graphics-Engine 
Programming Using the DMA Model
Historically (that is, with AGP 1.0 and AGP 2.0), AGP memory used for command DMA buffers was 
accessed by the processor through the AGP aperture space (this feature is referred to as host 
translation
). This address space was mapped as write-combining due to the fact that the processor’s 
caches were not snooped by an AGP master (that is, coherency was not enforced for AGP memory). 
Write-combining offered the best bandwidth in this situation because write-combining buffers could 
be sent to system memory as full write-combining buffers. However, system memory still needed to 
be written, which used memory bandwidth.
On current systems however, coherency between an AGP master (making accesses through the AGP 
aperture) and the processor caches is maintained due to the HyperTransport protocol and the MOESI 
(modified, owner, exclusive, shared, invalid) caching policy. Coherency support between an AGP 
master and the processor caches is enabled through a bit in the GART entry (Gart_entry.coh). The 
AGP miniport driver sets this bit as it maps entries in the GART. The video graphics miniport driver 
can verify this feature in the AGP 3.0-compliant register (AGPSTAT.ita_entry.coh), which is found in 
the AGP bridge device. 
Note: Coherency support is implemented by hardware in AMD Athlon 64 and AMD Opteron 
processors, and is not specific to the AGP tunnel device, even though the support is indicated 
in the tunnel’s AGP 3.0-compliant register (AGPSTAT.ita_entry.coh).
Therefore, a key optimization for the DMA model on AMD Athlon 64 and AMD Opteron processors 
is that the AGP master may read the data from the processor caches faster than reading data from the 
DDR memory, since the processor caches operate at higher clock frequencies. As processor clock 
Address MAP
& GART
System Request
Queue
24-entry
CPU 0
All buffers are 64-bit 
command/address
Router
10-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
12-entry Buffer
Memory 
Command
Queue
20-entry
CPU 1
HyperTransport 0 
Input
HyperTransport 1 
Input
HyperTransport 2 
Input
Victim Buffer (8-entry)
Write Buffer (4-entry)
Instruction MAB (2-entry)
Data MAB (8-entry)
to
DCT
Hypertransport 0 
Output
HyperTransport 1 
Output
HyperTransport 2 
Output
to
CPU
XBAR
Address MAP
& GART
System Request
Queue
24-entry
CPU 0
All buffers are 64-bit 
command/address
Router
10-entry Buffer
Router
10-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
16-entry Buffer
Router
12-entry Buffer
Router
12-entry Buffer
Memory 
Command
Queue
20-entry
CPU 1
HyperTransport 0 
Input
HyperTransport 1 
Input
HyperTransport 2 
Input
Victim Buffer (8-entry)
Write Buffer (4-entry)
Instruction MAB (2-entry)
Data MAB (8-entry)
to
DCT
Hypertransport 0 
Output
HyperTransport 1 
Output
HyperTransport 2 
Output
to
CPU
XBAR