AMD athlon 64 User Manual

Page of 48
Chapter 3
Analysis and Recommendations
29
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
40555
Rev. 3.00
June 2006
Figure 9.
Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a 
Very High Background Load (Full Subscription)
In the no crossfire case, the total memory bandwidth observed on the memory controller on node 3 is 
4.5 GB/s and several buffer queues on node 3 are saturated. For detailed analysis, refer to Section A.3 
on page 42.
Thus, while, in general, all equal hop cases take equal time, there can be exceptions to this rule if 
some resources in the system—such as HyperTransport link bandwidth and HyperTransport buffer 
capacity—are saturated
3.4.2
Myth: Greater Hop Distance Always Means Slower Time.
As a general rule, a 2 hop case will be slower than a 1 hop case, which, in turn, will be slower than a 
0 hop case, if the only change between the cases is thread and memory placement. 
For example, the synthetic test demonstrates how a given 0 hop-0 hop case is slower than a 
0 hop-1 hop case. The example shows how saturating memory resources can cause this to occur.
Imagine yourself in the following situation: you are ready to check out at your favorite grocery store 
with a shopping cart full of groceries. Directly in front of you is a check-out lane with 20 customers 
standing in line but 50 feet to your left is another check-out lane with only two customers standing in 
line. Which would you go to? The check-out lane closest to your position has the lowest latency 
because you don't have far to travel. But the check-out lane 50 feet away has much greater latency 
because you have to walk 50 feet.
Clearly most people would walk the 50 feet, suffer the latency and arrive at a check-out lane with only 
two customers instead of 20. Experience tells us that the time waiting to check-out with 20 people 
ahead is far longer than the time needed to walk to the “remote” check-out lane and wait for only two 
people.
 
VERY HIGH: Total Time for both threads (write-write)
156%
216%
202%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
0.0.w.0  1.0.w.1  (0 Hops)  (0 Hops)
0.0.w.1  1.0.w.3  (1 Hops)  (1 Hops)
0.0.w.1  1.0.w.0  (1 Hops)  (1 Hops)
0 Hop
0 Hop
1 Hop
1 Hop
NO 
Xfire
1 Hop
1 Hop 
Xfire