AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
100
Minimize Floating-Point-to-Integer Conversions
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
Minimize Floating-Point-to-Integer Conversions
C++, C, and Fortran define floating-point-to-integer conversions
as truncating. This creates a problem because the active
rounding mode in an application is typically round-to-nearest-
even. The classical way to do a double-to-int conversion
therefore works as follows:
Example 1 (Fast):  
SUB    [I], EDX              ;trunc(X)=rndint(X)-correction
FLD    QWORD PTR [X]         ;load double to be converted
FSTCW  [SAVE_CW]             ;save current FPU control word
MOVZX  EAX, WORD PTR[SAVE_CW];retrieve control word
OR     EAX, 0C00h
;rounding control field = truncate
MOV    WORD PTR [NEW_CW], AX ;new FPU control word
FLDCW  [NEW_CW]              ;load new FPU control word
FISTP  DWORD PTR [I]         ;do double->int conversion
FLDCW  [SAVE_CW]             ;restore original control word
The AMD Athlon processor contains special acceleration
hardware to execute such code as quickly as possible. In most
situations, the above code is therefore the fastest way to
perform floating-point-to-integer conversion and the conversion
is compliant both with programming language standards and
the IEEE-754 standard.
According to the recommendations for inlining (see “Always
Inline Functions with Fewer than 25 Machine Instructions” on
page 72),
 the above code should not be put into a separate
subroutine (e.g., ftol). It should rather be inlined into the main
code. 
In some codes, floating-point numbers are converted to an
integer and the result is immediately converted back to
floating-point. In such cases, the FRNDINT instruction should
be used for maximum performance instead of FISTP in the code
above. FRNDINT delivers the integral result directly to an FPU
register in floating-point form, which is faster than first using
FISTP to store the integer result and then converting it back to
floating-point with FILD. 
If there are multiple, consecutive floating-point-to-integer
c o nve rs i on s ,   t h e  c o s t   o f   F L D C W   o p e ra t io n s   s h o u ld   b e
minimized by saving the current FPU control word, forcing the