AMD amd64 architecture User Manual

106

128-Bit Media and Scientific Programming

AMD64 Technology

24592—Rev. 3.15—November 2009

4.2

Capabilities

The 128-bit media instructions are designed to support media and scientific applications. The vector
operands used by these instructions allow applications to operate in parallel on multiple elements of
vectors. The elements can be integers (from bytes to quadwords) or floating-point (either single-
precision or double-precision). Arithmetic operations produce signed, unsigned, and/or saturating
results.

The availability of several types of vector move instructions and (in 64-bit mode) twice the legacy
number of XMM registers (a total of 16 such registers) can eliminate substantial memory-access
overhead, making a substantial difference in performance.

4.2.1 Types of Applications

Typical media applications well-suited to the 128-bit media programming model include a broad range
of audio, video, and graphics programs. For example, music synthesis, speech synthesis, speech
recognition, audio and video compression (encoding) and decompression (decoding), 2D and 3D
graphics, streaming video (up to high-definition TV), and digital signal processing (DSP) kernels are
all likely to experience higher performance using 128-bit media instructions than using other types of
instructions in AMD64 architecture.

Such applications commonly use small-sized integer or single-precision floating-point data elements
in repetitive loops, in which the typical operations are inherently parallel. For example, 8-bit and 16-bit
data elements are commonly used for pixel information in graphics applications, in which each of the
RGB pixel components (red, green, blue, and alpha) are represented by an 8-bit or 16-bit integer. 16-
bit data elements are also commonly used for audio sampling.

The 128-bit media instructions allow multiple data elements like these to be packed into 128-bit vector
operands located in XMM registers or memory. The instructions operate in parallel on each of the
elements in these vectors. For example, 16 elements of 8-bit data can be packed into a 128-bit vector
operand, so that all 16 byte elements are operated on simultaneously, and in pairs of source operands,
by a single instruction.

The 128-bit media instructions also support a broad spectrum of scientific applications. For example,
their ability to operate in parallel on double-precision floating-point vector elements makes them well-
suited to computations like dense systems of linear equations, including matrix and vector-space
operations with real and complex numbers. In professional CAD applications, for example, high-
performance physical-modeling algorithms can be implemented to simulate processes such as heat
transfer or fluid dynamics.

4.2.2 Integer Vector Operations

Most of the 128-bit media arithmetic instructions perform parallel operations on pairs of vectors.
Vector operations are also called packed or SIMD (single-instruction, multiple-data) operations. They
take vector operands consisting of multiple elements, and all elements are operated on in parallel.
Figure 4-1 on page 107 shows an example of parallel operations on pairs of 16 byte-sized integers in