Справочник Пользователя для AMD 250

Скачать
Страница из 384
104
Cache and Memory Optimizations
Chapter 5
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
5.6
Prefetch Instructions
Optimization
Where appropriate, use one of the prefetch instructions to increase the effective bandwidth of the 
AMD Athlon 64 and AMD Opteron processors.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
Prefetch instructions take advantage of the high bus bandwidth of the AMD Athlon 64 and 
AMD Opteron processors to hide latencies when fetching data from system memory. A prefetch 
instruction initiates a read request of a specified address and reads the entire cache line that contains 
that address.
AMD Athlon 64 and AMD Opteron processors perform three types of prefetches:
The prefetch instructions can be used anywhere, in any type of code. The use of prefetch instructions 
is not affected by the values of Control Register 0 (CR0) bits, such as CR0.EM and CR0.TS.
Prefetching versus Preloading
In code that makes irregular memory accesses rather than sequential accesses, an ordinary MOV 
instruction is the best way to load data. But in situations where sequential addresses are read, prefetch 
Prefetch type
Description
Load
Reads the data into the L1 data cache; the data is later evicted to the L2 cache. The 
following instructions perform load prefetches: PREFETCH, PREFETCHT0, 
PREFETCHT1, and PREFETCHT2.
Store
Reads the data into the L1 data cache and marks the data as modified; the data is 
later evicted to the L2 cache. The PREFETCHW instruction performs a store prefetch.
Nontemporal
The PREFETCHNTA instruction performs a nontemporal prefetch. The data is read  
into the L1 data cache;  to avoid cache pollution, when a PREFETCHNTA misses in 
the L2 cache and reads from memory, the data is never evicted to the L2 cache. When 
a PREFETCHNTA hits in the L2 cache, the data is evicted back to the L2 cache. AMD 
Athlon 64 and AMD Opteron processors prior to Revision E read data into one way of 
the L1 cache when the PREFETCHNTA instruction was used. Revision E processors 
read PREFETCHNTA data into both ways of the L1 cache.