AMD athlon 64 Manuale Utente

Pagina di 48
34
Analysis and Recommendations
Chapter 3
40555
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
Figure 15. Both Write-Only Threads Running on Node 0 (Different Cores) under Very 
High Background Load (High Subscription)
Under a very high background load, for the 0 hop-1 hop case, there is a total memory access rate of 
4.78 GB/s on node 1. Several buffer queues on node 1 are saturated. For detailed analysis, refer to 
section Section A.5 on page 43.
Thus, greater hop distance does not always mean slower time. Remember that it is still advised that 
the developer keep the data local as much as possible. In the analogy used above, if the local queue 
has 20 customers and the remote one has two, the customer would much rather have been standing in 
front of the queue with two customers and make that his local queue in the first place. In the synthetic 
case above, keeping the first thread on node 0 doing local writes and the second thread on node 1 
doing local writes would be the fastest.
3.5
Locks
In general, it is good practice for user-level and kernel-level code to keep locks aligned to their natural 
boundaries. In some hardware implementations, locks that are not naturally aligned are handled with 
the mechanisms used for legacy memory mapped I/O and should absolutely be avoided if possible.
If a lock is aligned properly, it is treated as a faster cache lock. The significantly slower alternative to 
a cache lock is a bus lock, which should be avoided at all costs. Bus locks are very slow and force 
serialization of many operations unrelated to the lock within the processor. Furthermore bus locks 
prevent the entire HyperTransport fabric from making forward progress until the bus lock completes. 
Cache locks on the other hand are guaranteed atomicity by using the underlying cache coherence of 
the ccNUMA system and are much faster.
 
Very High: Total Time for both threads (write-write)
147%
158%
158%
169%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.0.w.0  0.1.w.0  (0 Hops)  (0 Hops)
0.0.w.0  0.1.w.1  (0 Hops)  (1 Hops)
0.0.w.0  0.1.w.2  (0 Hops)  (1 Hops)
0.0.w.0  0.1.w.3  (0 Hops)  (2 Hops)
0 Hop
0 Hop 
0 Hop
1 Hop
0 Hop
1 Hop
0 Hop
2 Hop