AMD athlon 64 User Manual

Page of 48
Chapter 3
Analysis and Recommendations
31
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
40555
Rev. 3.00
June 2006
However, as shown in Figure 11 on page 31, when both threads are write-only, the 0 hop-1 hop and 
0 hop-2 hop cases are faster than the 0 hop-0 hop case.
Figure 11.
Both Write-Only Threads Running on Node 0 (Different Cores) on an Idle 
System
When a single thread reads locally, it generates a memory bandwidth load of 1.64 GB/s. Assuming a 
sustained memory bandwidth of 70% of the theoretical maximum of 6.4 GB/s (PC3200 DDR 
memory), the cumulative bandwidth demanded by two read-only threads does not exceed the 
sustained memory bandwidth on that node and hence the local or 0 hop-0 hop case is the fastest. 
However, when a single thread writes locally it generates a memory bandwidth load of 2.98 GB/s. 
This is because each write in this test case results in a cache line eviction and thus generates twice the 
memory traffic generated by a read. The cumulative memory bandwidth demanded by 2 write-only 
threads now exceeds the sustained memory bandwidth on that node. The 0 hop-0 hop case now incurs 
the penalty of saturating the memory bandwidth on that node. For detailed analysis, refer to Section 
A.4 on page 42. 
It is useful to study whether this observation is also applicable under a variable background load.
One would expect that, if the memory bandwidth demanded of the remote node were increased, at 
some point the 0 hop-1 hop case would become as slow as, and perhaps slower than, the 
0 hop-0 hop case for the write-only threads.
The same two write-only threads as before are running on node 0, going though the following cases:
Both threads access local memory.
First thread accesses local memory and second thread accesses memory that is remote by one hop.
First thread accesses local memory and second thread access memory that is remote by two hops.
 
Total Time for both threads (write-write)
147%
126%
125%
136%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.0.w.0  0.1.w.0  (0 Hops)  (0 Hops)
0.0.w.0  0.1.w.1  (0 Hops)  (1 Hops)
0.0.w.0  0.1.w.2  (0 Hops)  (1 Hops)
0.0.w.0  0.1.w.3  (0 Hops)  (2 Hops)

Hop
0 Hop
1 Hop
0 Hop
1 Hop
0 Hop
2 Hop