AMD athlon 64 Manuale Utente

Chapter 3

Analysis and Recommendations

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

ccNUMA Multiprocessor Systems

40555

Rev. 3.00

June 2006

Chapter 3

Analysis and Recommendations

This section lays out recommendations to developers. Several of these recommendations are
accompanied by empirical results collected from test cases with analysis, as applicable.

In addition to making recommendations for performance improvement, this section clarifies some of
the common perceptions developers have about performance on AMD ccNUMA systems and, at the
same time, reveals the impact of low level system resources on performance. The extent of the impact
of these resources on the performance of any given application depends on the nature of the
application. The goal is to help developers think like the machine when interpreting “counter
intuitive” behavior while performance tuning.

While all analysis and recommendations are made with reference to the context of threads, they can
also be applied to processes.

3.1

Scheduling Threads

Scheduling multiple threads across nodes and cores of a system is complicated by a number of
factors:

•

Whether the system is idle.

•

Whether multiple threads access independent data.

•

Whether multiple threads access shared data.

3.1.1

Multiple Threads-Independent Data

When scheduling multiple threads which access independent data on an idle system, it is preferable
first to schedule the threads to an idle core of each node until all nodes are exhausted and then
schedule the other idle core of each node. In other words, schedule using node major order first,
followed by core major order. This is the suggested policy for a ccNUMA aware operating system on
an AMD dual-core multiprocessor system.

For example, when scheduling threads, which access independent data, on the dual-core Quartet,
scheduling the threads in the following order is recommended:

•

Core 0 on node 0, node 1, node 2 and node 3 in any order

•

Core 1 on node 0, node 1, node 2 and node 3 in any order

The two cores on each node of the dual-core AMD Opteron™ processor share the Northbridge
resources, which include the memory controller and the physical memory that is connected to that
node. The main motivation for this recommendation is to avoid overloading the resources on a single
node, while leaving the resources on the rest of the system unused—in other words load balancing.