AMD athlon 64 Manuale Utente

Pagina di 48
Analysis and Recommendations
Chapter 3
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
Figure 5.
Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 
Hops Away on an Idle System
In this test case, a write access is similar to a read access as far as the coherent HyperTransport™ link 
traffic or the memory traffic generated, except for certain key differences. A write access brings data 
into the cache much like a read and then modifies it in the cache. However, in this particular synthetic 
test case, there are several successive write accesses to sequential cache line elements in a 64-MB 
array. This results in a steady state condition of cache line evictions or write-backs for each write 
access. This increases the memory and HyperTransport traffic that normally occurs for a write-only 
thread to almost twice that of a read-only thread. For our test bench, when a thread does local read-
only accesses, it generates almost twice the memory bandwidth load of 1.64 GB/s, and when a thread 
performs local write-only accesses, it generates a memory bandwidth load of 
2.98 GB/s. Not only do writes take longer than reads for any given hop distance, but they slow down 
more quickly with hop distance as a result.
Keeping Data Local by Virtue of first Touch
In order to keep data local, it is recommended that the following principles be observed.
As long as a thread initializes the data it needs (writes to it for the first time) and does not rely on any 
other thread to perform the initialization, a ccNUMA-aware OS keeps data local on the node where 
the thread runs. This policy of keeping data local by writing to it for the first time is known as the 
local allocation policy by virtue of first touch. This is the default policy used by a ccNUMA-aware 
A ccNUMA-aware OS ensures local allocation by taking a page fault at the time of the first touch to 
data. When the page fault occurs the OS maps the virtual pages associated with the data to zeroed out 
physical pages. Now the data is resident on the node where the first touch occurred and any 
subsequent accesses to the data will have to be serviced from that node.
T im e   fo r   w r ite
113 %
127 %
1 4 9 %
0 .2
0 .4
0 .6
0 .8
1 .2
1 .4
1 .6
1 .8
0 .0 .w .0
0 .0 .w .1
0 .0 .w .2
0 .0 .w .3
0  Ho p
1  Ho p
1  H o p
2   Ho p