AMD 250 Manuale Utente

Cache and Memory Optimizations

Chapter 5

25112

Rev. 3.06

September 2005

Software Optimization Guide for AMD64 Processors

5.3

Cache-Coherent Nonuniform Memory Access
(ccNUMA)

Optimization

For applications with multiple threads, use OS functions to run a thread on a particular node and let
that thread allocate the memory that it requires so that the memory used is local to that node. In the
Microsoft Windows environment, the function to run a thread on a particular node is
SetThreadAffinityMask( ).

Be sure operating systems are properly configured to support ccNUMA. All versions of Microsoft
Windows XP for AMD64 and Windows Server for AMD64 support ccNUMA without any changes.
The 32-bit versions of Windows Server 2003, Enterprise Edition and Windows Server 2003,
Datacenter Edition require the /PAE boot parameter to support ccNUMA.

For 64-bit Linux, there may be separate kernels supporting ccNUMA that should be selected.

Application

This optimization applies to:

•

32-bit software

•

64-bit software

Rationale

Most multiple processor systems available today employ a symmetric multiprocessing (SMP)
architecture. Processors on an SMP platform generally share a common or centralized memory bus,
having identical memory access latencies regardless of the processor position. Because the processors
use the same bus and memory, system performance may be negatively affected when bottlenecks
occur due to increased demands on the single memory bus. Figure 1 shows a simplified block diagram
for a two processor SMP system.