AMD athlon 64 Manuale Utente

Pagina di 48
Chapter 3
Analysis and Recommendations
19
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
ccNUMA Multiprocessor Systems
40555
Rev. 3.00
June 2006
Chapter 3
Analysis and Recommendations
This section lays out recommendations to developers. Several of these recommendations are 
accompanied by empirical results collected from test cases with analysis, as applicable. 
In addition to making recommendations for performance improvement, this section clarifies some of 
the common perceptions developers have about performance on AMD ccNUMA systems and, at the 
same time, reveals the impact of low level system resources on performance. The extent of the impact 
of these resources on the performance of any given application depends on the nature of the 
application. The goal is to help developers think like the machine when interpreting “counter 
intuitive” behavior while performance tuning.
While all analysis and recommendations are made with reference to the context of threads, they can 
also be applied to processes.
3.1
Scheduling Threads
Scheduling multiple threads across nodes and cores of a system is complicated by a number of 
factors:
Whether the system is idle.
Whether multiple threads access independent data.
Whether multiple threads access shared data.
3.1.1
Multiple Threads-Independent Data
When scheduling multiple threads which access independent data on an idle system, it is preferable 
first to schedule the threads to an idle core of each node until all nodes are exhausted and then 
schedule the other idle core of each node. In other words, schedule using node major order first, 
followed by core major order. This is the suggested policy for a ccNUMA aware operating system on 
an AMD dual-core multiprocessor system.
For example, when scheduling threads, which access independent data, on the dual-core Quartet, 
scheduling the threads in the following order is recommended:
Core 0 on node 0, node 1, node 2 and node 3 in any order
Core 1 on node 0, node 1, node 2 and node 3 in any order
The two cores on each node of the dual-core AMD Opteron™ processor share the Northbridge 
resources, which include the memory controller and the physical memory that is connected to that 
node. The main motivation for this recommendation is to avoid overloading the resources on a single 
node, while leaving the resources on the rest of the system unused—in other words load balancing.