AMD athlon 64 Manuale Utente

Pagina di 48
30
Analysis and Recommendations
Chapter 3
40555
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
This analogy clearly communicates the performance effects of queuing time versus latency. In a 
computer server, with many concurrent outstanding memory requests, we would gladly incur some 
additional latency (walking) to spread memory transactions (check-out processes) across multiple 
memory controllers (check-out lanes) because this greatly improves performance by reducing the 
queuing time.
However, if the number of customers at the remote queue increases to 20 or more, then the customer 
would much rather wait for the local queue directly in front of him.
The following example was extracted by mining the results of the synthetic test case. 
There are four cases illustrated in Figure 10. In each case there are two threads running on 
node 0 (core 0 and core 1 respectively). The system is left idle except for the two threads.
Both threads access memory on node 0. 
First thread accesses memory on node 0. The second thread accesses memory on node 1, which is 
one hop away. 
First thread accesses memory on node 0. The second thread accesses memory on node 2, which is 
one hop away. 
First thread accesses memory on node 0. The second thread accesses memory on node 3, which is 
two hops away. 
As shown in Figure 10, synthetic tests indicate that when both threads are read-only, the 0 hop-0 hop 
case is faster than the 0 hop-1 hop and 0 hop-2 hop cases.
Figure 10.
Both Read-Only Threads Running on Node 0 (Different Cores) on an Idle 
System
 
Total Time for both threads (read-read)
102%
108%
107%
118%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.0.r.0  0.1.r.0  (0 Hops)  (0 Hops)
0.0.r.0  0.1.r.1  (0 Hops)  (1 Hops)
0.0.r.0  0.1.r.2  (0 Hops)  (1 Hops)
0.0.r.0  0.1.r.3  (0 Hops)  (2 Hops)
0 Hop
0 Hop 
0 Hop
1 Hop
0 Hop
1 Hop
0 Hop
2 Hop