Справочник Пользователя для AMD 250

104

Cache and Memory Optimizations

Chapter 5

25112

Rev. 3.06

September 2005

Software Optimization Guide for AMD64 Processors

5.6

Prefetch Instructions

Optimization

Where appropriate, use one of the prefetch instructions to increase the effective bandwidth of the
AMD Athlon 64 and AMD Opteron processors.

Application

This optimization applies to:

•

32-bit software

•

64-bit software

Rationale

Prefetch instructions take advantage of the high bus bandwidth of the AMD Athlon 64 and
AMD Opteron processors to hide latencies when fetching data from system memory. A prefetch
instruction initiates a read request of a specified address and reads the entire cache line that contains
that address.

AMD Athlon 64 and AMD Opteron processors perform three types of prefetches:

The prefetch instructions can be used anywhere, in any type of code. The use of prefetch instructions
is not affected by the values of Control Register 0 (CR0) bits, such as CR0.EM and CR0.TS.

Prefetching versus Preloading

In code that makes irregular memory accesses rather than sequential accesses, an ordinary MOV
instruction is the best way to load data. But in situations where sequential addresses are read, prefetch

Prefetch type

Description

Load

Reads the data into the L1 data cache; the data is later evicted to the L2 cache. The
following instructions perform load prefetches: PREFETCH, PREFETCHT0,
PREFETCHT1, and PREFETCHT2.

Store

Reads the data into the L1 data cache and marks the data as modified; the data is
later evicted to the L2 cache. The PREFETCHW instruction performs a store prefetch.

Nontemporal

The PREFETCHNTA instruction performs a nontemporal prefetch. The data is read
into the L1 data cache; to avoid cache pollution, when a PREFETCHNTA misses in
the L2 cache and reads from memory, the data is never evicted to the L2 cache. When
a PREFETCHNTA hits in the L2 cache, the data is evicted back to the L2 cache. AMD
Athlon 64 and AMD Opteron processors prior to Revision E read data into one way of
the L1 cache when the PREFETCHNTA instruction was used. Revision E processors
read PREFETCHNTA data into both ways of the L1 cache.