AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
110
Use 3DNow!™ Instructions for Fast Square Root and 
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
Use 3DNow!™ Instructions for Fast Square Root and 
Reciprocal Square Root
3DNow! instructions can be used to compute a very fast, highly
accurate square root and reciprocal square root.
Optimized 15-Bit Precision Square Root
This square root operation can be executed in only 7 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:  
MOVD
MM0, [MEM]
;        0 | a
PFRSQRT
MM1, MM0
;1/sqrt(a) | 1/sqrt(a) (approximate)
PUNPCKLDQ MM0, MM0
;        a | a         (MMX instr.)
PFMUL
MM0, MM1
;  sqrt(a) | sqrt(a)
Optimized 24-Bit Precision Square Root
This square root operation can be executed in only 19 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:  
MOVD
MM0, [MEM]
;         0 | a
PFRSQRT
MM1, MM0
; 1/sqrt(a) | 1/sqrt(a)  (approx.)
MOVQ
MM2, MM1
;   X_0 = 1/(sqrt a)     (approx.)
PFMUL
MM1, MM1
X_0 * X_0 | X_0 * X_0
   (step 1)
PUNPCKLDQ MM0, MM0
;         a | a          (MMX instr)
PFRSQIT1
MM1, MM0
;    (intermediate)      (step 2)
PFRCPIT2
MM1, MM2
; 1/sqrt(a) | 1/sqrt(a)  (step 3)
PFMUL 
MM0, MM1
;   sqrt(a) | sqrt(a)