Справочник Пользователя для AMD 250

Скачать
Страница из 384
52
C and C++ Source-Level Optimizations
Chapter 2
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
2.26
Fast Floating-Point-to-Integer Conversion
Optimization
Use 3DNow! PF2ID instruction to perform truncating conversion to accomplish rapid floating-point-
to-integer conversion, if the floating-point operand is a type 
float
.
Application
This optimization applies to 32-bit software.
Rationale
Floating-point-to-integer conversion in C programs is typically a very slow operation. The semantics 
of C and C++ demand that the conversion use truncation. If the floating-point operand is of type 
float
, and the compiler supports 3DNow! code generation, then the 3DNow! PF2ID instruction, 
which performs truncating conversion, can be utilized by the compiler to accomplish rapid floating-
point-to-integer conversion.
Note: The PF2ID instruction does not provide conversion compliant with the IEEE-754 standard. 
Some operands of type 
float
 (IEEE-754 single precision) such as NaNs, infinities, and 
denormals, are either unsupported or not handled in compliance with the IEEE-754 standard 
by 3DNow! technology.
For double precision operands, the usual way to accomplish truncating conversion involves the 
following algorithm:
1. Save the current x87 rounding mode (this is usually round to nearest or even).
2. Set the x87 rounding mode to truncation.
3. Load the floating-point source operand and store the integer result.
4. Restore the original x87 rounding mode.
This algorithm is typically implemented through the C run-time library function 
ftol
. While the 
AMD Athlon 64 and AMD Opteron processors have special hardware optimizations to speed up the 
changing of x87 rounding modes and therefore 
ftol
, calls to 
ftol
 may still tend to be slow.
For situations where very fast floating-point-to-integer conversion is required, the conversion code in 
Listing 24 on page 53 may be helpful. This code uses the current rounding mode instead of truncation 
when performing the conversion. Therefore, the result may differ by 1 from the 
ftol
 result. The 
replacement code adds the “magic number” 2
52
+2
51
 to the source operand, then stores the double 
precision result to memory and retrieves the lower doubleword of the stored result. Adding the magic 
number shifts the original argument to the right inside the double precision mantissa, placing the 
binary point of the sum immediately to the right of the least-significant mantissa bit. Extracting the 
lower doubleword of the sum then delivers the integral portion of the original argument.