AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
60
Replace Branches with Computation in 3DNow!™ Code
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
Replace Branches with Computation in 3DNow!™ Code
Branches negatively impact the performance of 3DNow! code.
Branches can operate only on one data item at a time, i.e., they
are inherently scalar and inhibit the SIMD processing that
makes 3DNow! code superior. Also, branches based on 3DNow!
comparisons require data to be passed to the integer units,
which requires either transport through memory, or the use of
“MOVD reg, MMreg” instructions. If the body of the branch is
small, one can achieve higher performance by replacing the
b ra n ch  w i t h   c o m pu t a t i o n .   Th e   c o m pu t a t i o n   s i mu la t e s
predicated execution or conditional moves. The principal tools
for this are the following instructions: PCMPGT, PFCMPGT,
PFCMPGE, PFMIN, PFMAX, PAND, PANDN, POR, PXOR.
Muxing Constructs
The mos t  impor tant construct to avoiding  branches  in
3DNow!™ and MMX™ code is a 2-way muxing construct that is
equivalent to the ternary operator “?:” in C and C++. It is
implemented using the PCMP/PFCMP, PAND, PANDN, and
POR instructions. To maximize performance, it is important to
apply the PAND and PANDN instructions in the proper order.
Example 1 (Avoid):  
; r = (x < y) ? a : b
;
; in:  mm0  a
;      mm1  b
;      mm2  x
;      mm3  y
; out: mm1  r
PCMPGTD  MM3, MM2   ; y > x ? 0xffffffff : 0
MOVQ     MM4, MM3   ; duplicate mask
PANDN    MM3, MM0   ; y > x ? 0 : a
PAND     MM1, MM4   ; y > x ? b : 0
POR      MM1, MM3   ; r = y > x ? b : a
Because the use of PANDN destroys the mask created by PCMP,
the mask needs to be saved, which requires an additional
register. This adds an instruction, lengthens the dependency
chain, and increases register pressure. Therefore 2-way muxing
constructs should be written as follows.