Intel IA-32 Manuale Utente

Pagina di 636
Vol. 3A 7-53
MULTIPLE-PROCESSOR MANAGEMENT
7.11.6.5
Guidelines for Scheduling Threads on Logical Processors Sharing 
Execution Resources
Because the logical processors, the order in which threads are dispatched to logical processors
for execution can affect the overall efficiency of a system. The following guidelines are recom-
mended for scheduling threads for execution.
Dispatch threads to one logical processor per processor core before dispatching threads to
the other logical processor sharing execution resources in the same processor core. 
In an MP system with two or more physical packages, distribute threads out over all the
physical processors, rather than concentrate them in one or two physical processors.
Use processor affinity to assign a thread to a specific processor core or package, depending
on the cache-sharing topology. The practice increases the chance that the processor’s
caches will contain some of the thread’s code and data when it is dispatched for execution
after being suspended. 
7.11.6.6
Eliminate Execution-Based Timing Loops
Intel discourages the use of timing loops that depend on a processor’s execution speed to
measure time. There are several reasons:
Timing loops cause problems when they are calibrated on a IA-32 processor running at one
clock speed and then executed on a processor running at another clock speed. 
Routines for calibrating execution-based timing loops produce unpredictable results when
run on an IA-32 processor supporting Hyper-Threading Technology. This is due to the
sharing of execution resources between the logical processors within a physical package. 
To avoid the problems described, timing loop routines must use a timing mechanism for the loop
that does not depend on the execution speed of the logical processors in the system. The
following sources are generally available:
A high resolution system timer (for example, an Intel 8254).
A high resolution timer within the processor (such as, the local APIC timer or the time-
stamp counter).
For additional information, see the IA-32 Intel® Architecture Optimization Reference Manual.
7.11.6.7
Place Locks and Semaphores in Aligned, 128-Byte Blocks of 
Memory
When software uses locks or semaphores to synchronize processes, threads, or other code
sections; Intel recommends that only one lock or semaphore be present within a cache line. In
an Intel Xeon processor MP (which have 128-byte wide cache lines), following this recommen-
dation means that each lock or semaphore should be contained in a 128-byte block of memory
that begins on a 128-byte boundary. The practice minimizes the bus traffic required to service
locks.