Справочник Пользователя для AMD 250

C and C++ Source-Level Optimizations

Chapter 2

25112

Rev. 3.06

September 2005

Software Optimization Guide for AMD64 Processors

2.26

Fast Floating-Point-to-Integer Conversion

Optimization

Use 3DNow! PF2ID instruction to perform truncating conversion to accomplish rapid floating-point-
to-integer conversion, if the floating-point operand is a type

float

Application

This optimization applies to 32-bit software.

Rationale

Floating-point-to-integer conversion in C programs is typically a very slow operation. The semantics
of C and C++ demand that the conversion use truncation. If the floating-point operand is of type

float

, and the compiler supports 3DNow! code generation, then the 3DNow! PF2ID instruction,

which performs truncating conversion, can be utilized by the compiler to accomplish rapid floating-
point-to-integer conversion.

Note: The PF2ID instruction does not provide conversion compliant with the IEEE-754 standard.

Some operands of type

float

(IEEE-754 single precision) such as NaNs, infinities, and

denormals, are either unsupported or not handled in compliance with the IEEE-754 standard
by 3DNow! technology.

For double precision operands, the usual way to accomplish truncating conversion involves the
following algorithm:

1. Save the current x87 rounding mode (this is usually round to nearest or even).

2. Set the x87 rounding mode to truncation.

3. Load the floating-point source operand and store the integer result.

4. Restore the original x87 rounding mode.

This algorithm is typically implemented through the C run-time library function

ftol

. While the

AMD Athlon 64 and AMD Opteron processors have special hardware optimizations to speed up the
changing of x87 rounding modes and therefore

ftol

, calls to

ftol

may still tend to be slow.

For situations where very fast floating-point-to-integer conversion is required, the conversion code in
Listing 24 on page 53 may be helpful. This code uses the current rounding mode instead of truncation
when performing the conversion. Therefore, the result may differ by 1 from the

ftol

result. The

replacement code adds the “magic number” 2

to the source operand, then stores the double

precision result to memory and retrieves the lower doubleword of the stored result. Adding the magic
number shifts the original argument to the right inside the double precision mantissa, placing the
binary point of the sum immediately to the right of the least-significant mantissa bit. Extracting the
lower doubleword of the sum then delivers the integral portion of the original argument.