AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
108
Use 3DNow!™ Instructions for Fast Division
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
FEMMS instruction is supported for backward compatibility
with AMD-K6 family processors, and is aliased to the EMMS
instruction. 
3DNow! and MMX instructions are designed to be used
concurrently with no switching issues. Likewise, enhanced
3DNow! instructions can be used simultaneously with MMX
instructions. However, x87 and 3DNow! instructions share the
same architectural registers so there is no easy way to use them
concurrently without cleaning up the register file in between
using FEMMS/EMMS.
Use 3DNow!™ Instructions for Fast Division
3DNow! instructions can be used to compute a very fast, highly
accurate reciprocal or quotient.
Optimized 14-Bit Precision Divide
This divide operation executes with a total latency of seven
cycles, assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:  
MOVD
MM0, [MEM]
;
  0 | W
PFRCP
MM0, MM0
;
1/W | 1/W
(approximate)
MOVQ
MM2, [MEM]
;
  Y | X
PFMUL
MM2, MM0
;
Y/W | X/W
Optimized Full 24-Bit Precision Divide
This divide operation executes with a total latency of 15 cycles,
assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:  
MOVD
  MM0, [W]
;          0 | W
PFRCP
  MM1, MM0
;        1/W | 1/W  (approximate)
PUNPCKLDQ  MM0, MM0
;          W | W    (MMX instr.)
PFRCPIT1   MM0, MM1
;        1/W | 1/W  (refine)
MOVQ
  MM2, [X_Y]
;          Y | X
PFRCPIT2   MM0, MM1
;        1/W | 1/W  (final)
PFMUL
  MM2, MM0
:        Y/W | X/W