AMD x86 Manuale Utente
Replace Branches with Computation in 3DNow!™ Code
61
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
Example 2 (Preferred):
; r = (x < y) ? a : b
;
; in: mm0 a
; mm1 b
; mm2 x
; mm3 y
; out: mm1 r
;
; in: mm0 a
; mm1 b
; mm2 x
; mm3 y
; out: mm1 r
PCMPGTD MM3, MM2 ; y > x ? 0xffffffff : 0
PAND MM1, MM3 ; y > x ? b : 0
PANDN MM3, MM0 ; y > x > 0 : a
POR MM1, MM3 ; r = y > x ? b : a "
PAND MM1, MM3 ; y > x ? b : 0
PANDN MM3, MM0 ; y > x > 0 : a
POR MM1, MM3 ; r = y > x ? b : a "
Sample Code Translated into 3DNow!™ Code
The following examples use scalar code translated into 3DNow!
code. Note that it is not recommended to use 3DNow! SIMD
instructions for scalar code, because the advantage of 3DNow!
instructions lies in their “SIMDness”. These examples are
meant to demonstrate general techniques for translating source
code with branches into branchless 3DNow! code. Scalar source
code was chosen to keep the examples simple. These techniques
work in an identical fashion for vector code.
code. Note that it is not recommended to use 3DNow! SIMD
instructions for scalar code, because the advantage of 3DNow!
instructions lies in their “SIMDness”. These examples are
meant to demonstrate general techniques for translating source
code with branches into branchless 3DNow! code. Scalar source
code was chosen to keep the examples simple. These techniques
work in an identical fashion for vector code.
Each example shows the C code and the resulting 3DNow! code.
Example 1:
C code:
float x,y,z;
if (x < y) {
if (x < y) {
z += 1.0;
}
else {
else {
z -= 1.0;
}
3DNow! code:
;in:
MM0 = x
;
MM1 = y
;
MM2 = z
;out: MM0 = z
MOVQ
MOVQ
MM3, MM0 ;save x
MOVQ
MM4, one ;1.0
PFCMPGE
MM0, MM1 ;x < y ? 0 : 0xffffffff
PSLLD
MM0, 31 ;x < y ? 0 : 0x80000000
PXOR
MM0, MM4 ;x < y ? 1.0 : -1.0
PFADD
MM0, MM2 ;x < y ? z+1.0 : z-1.0