AMD Typewriter x86 사용자 설명서

Minimize Floating-Point-to-Integer Conversions

101

22007E/0—November 1999

AMD Athlon™ Processor x86 Code Optimization

F P U i n t o t r u n c a t i n g m o d e , a n d p e r fo r m i n g a l l o f t h e
conversions before restoring the original control word.

The speed of the above code is somewhat dependent on the
nature of the code surrounding it. For applications in which the
speed of floating-point-to-integer conversions is extremely
critical for application performance, experiment with either of
the following substitutions, which may or may not be faster than
the code above.

The first substitution simulates a truncating floating-point to
integer conversion provided that there are no NaNs, infinities,
and overflows. This conversion is therefore not IEEE-754
compliant. This code works properly only if the current FPU
rounding mode is round-to-nearest-even, which is usually the
case.

Example 2 (Potentially faster).

FLD QWORD PTR [X]

;load double to be converted

FST DWORD PTR [TX]

;store X because sign(X) is needed

FIST DWORD PTR [I]

;store rndint(x) as default result

FISUB DWORD PTR [I]

;compute DIFF = X - rndint(X)

FSTP DWORD PTR [DIFF]

;store DIFF as we need sign(DIFF)

MOV EAX, [TX]

MOV EDX, [DIFF]

;DIFF

TEST EDX, EDX

;DIFF == 0 ?

JZ $DONE

;default result is OK, done

XOR EDX, EAX

; need correction if sign(X) != sign(DIFF)

SAR EAX, 31

;(X<0) ? 0xFFFFFFFF : 0

SAR EDX, 31

; sign(X)!=sign(DIFF)?0xFFFFFFFF:0

LEA EAX, [EAX+EAX+1]

;(X<0) ? 0xFFFFFFFF : 1

AND EDX, EAX

;correction: -1, 0, 1

SUB [I], EDX

;trunc(X)=rndint(X)-correction

$DONE:

The second substitution simulates a truncating floating-point to
integer conversion using only integer instructions and therefore
works correctly independent of the FPUs current rounding
mode. It does not handle NaNs, infinities, and overflows
according to the IEEE-754 standard. Note that the first
instruction of this code may cause an STLF size mismatch
resulting in performance degradation if the variable to be
converted has been stored recently.