AMD athlon 64 User Manual

Page of 48
42
Appendix A
40555
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
A.3
Why Is the No Crossfire Case Slower Than the 
Crossfire Case on a System under a Very High 
Background Load (Full Subscription)?
When the threads are firing at each other (crossfire) and all other free cores are running background 
threads at very high load, the system sees the following traffic pattern, where each node receives 
memory requests from the threads as described:
Node 0: 1 background and 1 foreground threads.
Node 1: 1 background and 1 foreground threads.
Node 3: 2 background threads.
Node 2: 2 background threads.
In the no crossfire case, the system sees the following traffic pattern:
Node 0: 1 background thread
Node 1: 1 background and 1 foreground threads.
Node 3: 2 background and 1 foreground threads.
Node 2: 2 background threads.
The no crossfire case suffers from a greater load imbalance than the crossfire case with node 3 
suffering the worst effect of this imbalance.
Remember that each of the background threads asks for data at a rate of 4GB/s and each of the 
foreground threads asks for data at a rate of 2.98 GB/s.
Data shows that there is total memory access of 4.5GB/s on node 3 and that several buffer queues on 
node 3 are saturated and cannot absorb the data provided by the memory controller any faster. 
A.4
Why Is 0 Hop-0 Hop Case Slower Than the 
0 Hop-1 Hop Case on an Idle System for Write-
Only Threads?
When both write-only threads running on different cores of node 0 access data locally 
(0 hop-0 hop), significant demands are placed on the local memory on node 0.
Data demonstrates that there is total memory access of 4.5 GB/s on node 0. The memory on node 0 
cannot handle requests for data any faster and is running at full capacity. Several buffer queues on 
node 0 are saturated and waiting for the memory requests to be serviced.