Справочник Пользователя для AMD athlon 64

Скачать
Страница из 48
38
Conclusions
Chapter 4
40555
Rev. 3.00
June 2006
Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ 
ccNUMA Multiprocessor Systems
Data placement tools can also come in handy when a thread needs more data than the amount of 
physical memory available on a node. Certain OSs also allow data migration with these tools or API. 
Using this feature, data can be migrated from the node where it was first touched to the node where it 
is subsequently accessed. There is a cost associated with this migration and it is not advised to use it 
frequently. For additional details on the tools and APIs offered by various OS for thread and memory 
placement refer to Section A.7 on page 44.
It is recommended to avoid sharing of data resident within a single cache line between threads 
running on different cores. 
Advanced developers may also run into interesting cases when experimenting with the thread and 
data placement tools and APIs. Sometimes, when comparing workloads that are symmetrical in all 
respects except for the thread and data placement used, the expected symmetry may be obscured. 
These cases can mostly be explained by understanding the underlying system and avoiding saturation 
of resources due to an imbalanced load.
The buffer queues constitute one such resource. The lengths of these queues are configured by the 
BIOS with some hardware-specific limits that are specified in the BIOS Kernel and Developers Guide 
for the particular processor. Following AMD recommendations, the BIOS allocates these buffers on a 
link-by-link basis to optimize for the most common workloads.
In general, certain pathological access patterns should be avoided: several nodes trying to access data 
on one node or the crossfire scenario can saturate underlying resources such as the HyperTransport™ 
link bandwidth and HyperTransport buffer queues and should be avoided when possible. AMD makes 
event profiling tools available that developers can use to analyze whether their application is 
demonstrating such behavior.
AMD very strongly recommends keeping user-level and kernel-level locks aligned to their natural 
boundaries.
Some compilers for AMD multiprocessor systems provide additional hooks to allow for automatic 
parallelization of otherwise serial programs. There is also support for extensions to the OpenMP 
directives that can be used by OpenMP programs to improve performance.
While all the previous conclusions are stated in the context of threads, they can also be applied to 
processes.