Справочник Пользователя для AMD 250

Скачать
Страница из 384
54
C and C++ Source-Level Optimizations
Chapter 2
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
2.27
Speeding Up Branches Based on Comparisons 
Between Floats
Optimization
Store operands of type float into a memory location and use integer comparison with the memory 
location to perform fast branches in cases where compilers do not support fast floating-point 
comparison instructions or 3DNow! code generation.
Application
This optimization applies to 32-bit software.
Rationale
Branches based on floating-point comparisons are often slow. The AMD Athlon 64 and 
AMD Opteron processors support the FCOMI, FUCOMI, FCOMIP, and FUCOMIP instructions that 
allow implementation of fast branches based on comparisons between operands of type 
double
 or 
type 
float
. However, many compilers do not support generating these instructions. Likewise, 
floating-point comparisons between operands of type 
float
 can be accomplished quickly by using 
the 3DNow! PFCMP instruction if the compiler supports 3DNow! code generation.
Many compilers only implement branches based on floating-point comparisons by using FCOM or 
FCOMP to compare the floating-point operands, followed by 
FSTSW AX
 in order to transfer the x87 
condition-code flags into EAX. The subsequent branch is then based on the contents of the EAX 
register. Although the AMD Athlon 64 and AMD Opteron processors have acceleration hardware to 
speed up the FSTSW instruction, this process is still fairly slow.
Branches Dependent on Integer Comparisons Are Fast
One alternative for branches dependent upon the outcome of the comparison of operands of type 
float
 is to store the operand(s) into a memory location and then perform an integer comparison with 
that memory location. Branches dependent on integer comparisons are very fast. It should be noted 
that the replacement code uses a load dependent on an immediately prior store. If the store is not 
doubleword-aligned, no store-to-load-forwarding takes place, and the branch is still slow. Also, if 
there is a lot of activity in the load-store queue, forwarding of the store data may be somewhat 
delayed, thus negating some of the advantages of using the replacement code. It is recommended that 
you experiment with the replacement code to test whether it actually provides a performance increase 
in the code at hand.
The replacement code works well for comparisons against zero, including correct behavior when 
encountering a negative zero as allowed by the IEEE-754 standard. It also works well for comparing