AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
Use MMX™ PMADDWD Instruction to Perform Two 32-Bit Multiplies in Parallel
111
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization 
Newton-Raphson Reciprocal Square Root
The general Newton-Raphson reciprocal square root recurrence
is:
Z
i+1
 = 1/2 
 Z
i
 
 (3 – b 
 Z
i
2
)
To reduce the number of iterations, the initial approximation
rea d from  a table.  The 3D Now ! reciprocal square root
approximation is accurate to at least 15 bits. Accordingly, to
obtain a single-precision 24-bit reciprocal square root of an
input operand b, one Newton-Raphson iteration is required,
using the following sequence of 3DNow! instructions:
X
0
 = PFRSQRT(b)
X
1
 = PFMUL(X
0
,X
0
)
X
2
 = PFRSQIT1(b,X
1
)
X
3
 = PFRCPIT2(X
2
,X
0
)
X
4
 = PFMUL(b,X
3
)
The 24-bit final reciprocal square root value is X
3
. In the
AMD Athlon processor 3DNow! implementation, the estimate
contains the correct round-to-nearest value for approximately
87% of all arguments. The remaining arguments differ from the
correct round-to-nearest value by one unit-in-the-last-place. The
square root (X
4
) is formed in the last step by multiplying by the
input operand b.
Use MMX™ PMADDWD Instruction to Perform Two 32-Bit 
Multiplies in Parallel
The MMX PMADDWD instruction can be used to perform two
signed 16x16
32 bit multiplies in parallel, with much higher
performance than can be achieved using the IMUL instruction.
The PMADDWD instruction is designed to perform four
16x16
32 bit signed multiplies and accumulate the results
pairwise. By making one of the results in a pair a zero, there are
now just two multiplies. The following example shows how to
multiply 16-bit signed numbers a,b,c,d into signed 32-bit
products a
×
c and b
×
d: