Справочник Пользователя для AMD 250

Скачать
Страница из 384
Chapter 9
Optimizing with SIMD Instructions
217
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
9.14
Finding the Floating-Point Absolute Value of 
Operands of SSE, SSE2, and 3DNow!™ 
Instructions
Optimization
Use instructions that perform AND operations (PAND, ANDPS, and ANDPD) to determine the 
absolute value of floating-point operands of SSE, SSE2, and 3DNow!instructions.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
The MMX PAND instruction has a latency of 2 cycles, whereas the SSE and SSE2 AND instructions 
(ANDPS and ANDPD, respectively) have latencies of 3 cycles. The following examples illustrate 
how to clear the sign bits:
; 3DNow!
absmask DQ 7FFFFFFF7FFFFFFFh
pand mm0, [absmask]   ; Clear the sign bits of both floats in MM0.
; SSE
absmask DQ 7FFFFFFF7FFFFFFFh,7FFFFFFF7FFFFFFFh
andps xmm0, [absmask]   ; Clear the sign bits of all four floats in XMM0.
; SSE2
absmask DQ 7FFFFFFFFFFFFFFFh,7FFFFFFFFFFFFFFFh
andpd xmm0, [absmask]   ; Clear the sign bits of both doubles in XMM0.