AMD 250 Manuale Utente

Pagina di 384
50
C and C++ Source-Level Optimizations
Chapter 2
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
2.25
Accelerating Floating-Point Division and Square 
Root
Optimization
In applications that involve the heavy use of single precision division and square root operations, it is 
recommended that you port the code to SSE or 3DNow!™ inline assembly or use a compiler that can 
generate SSE or 3DNow! technology code. If neither of these methods are possible, the x87 FPU 
control word register precision control specification bits (PC) can be set to single precision to improve 
performance. (The processor defaults to double-extended precision. See AMD64 Architecture 
Programmer’s Manual Volume 1: Application Programming
 (order# 24592) for details on the FPU 
control register.)
Application
This optimization applies to 32-bit software.
Rationale
Division and square root have a much longer latency than other floating-point operations, even though 
the AMD Athlon 64 and AMD Opteron processors provide significant acceleration of these two 
operations. In some application programs, these operations occur so often as to seriously impact 
performance. If code has hot spots that use single precision arithmetic only (that is, all computation 
involves data of type 
float
) and for some reason cannot be ported to 3DNow! code, the following 
technique may be used to improve performance.
The x87 FPU has a precision-control field as part of the FPU control word. The precision-control 
setting determines rounding precision of instruction results and affects the basic arithmetic 
operations, including division and the extraction of square root. Division and square root on the 
AMD Athlon 64 and AMD Opteron processors are only computed to the number of bits necessary for 
the currently selected precision. Setting precision control to single precision (versus the Win32 
default of double precision) lowers the latency of those operations.
The Microsoft
®
 Visual C environment provides functions to manipulate the FPU control word and 
thus the precision control. Note that these functions are not very fast, so insert changes of precision 
control where it creates little overhead, such as outside a computation-intensive loop. Otherwise, the 
overhead created by the function calls outweighs the benefit from reducing the latencies of divide and 
square-root operations. For more information on this topic, see AMD64 Architecture Programmer's 
Manual Volume 1: Application Programming
 (order# 24592).
The following example shows how to set the precision control to single precision and later restore the 
original settings in the Microsoft Visual C environment.