AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
Stream of Packed Unsigned Bytes
125
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization 
The following code fragment uses the 3DNow! PAVGUSB
i n s tr u c t i o n   t o   p e r fo r m   ave ra g i n g   b e t we e n  t h e   s o u rc e
macroblock and destination macroblock:
Example 2 (Preferred):  
MOV
EAX, DWORD PTR Src_MB
MOV
EDI, DWORD PTR Dst_MB
MOV
EDX, DWORD PTR SrcStride
MOV
EBX, DWORD PTR DstStride
MOV
ECX, 16
L1:
MOVQ
MM0, [EAX]
;MM0=QWORD1
MOVQ
MM1, [EAX+8]
;MM1=QWORD2
PAVGUSB
MM0, [EDI]
;(QWORD1 + QWORD3)/2 with 
; adjustment
PAVGUSB
MM1, [EDI+8]
;(QWORD2 + QWORD4)/2 with 
; adjustment
ADD
EAX, EDX
MOVQ
[EDI], MM0
MOVQ
[EDI+8], MM1 
ADD
EDI, EBX
LOOP
L1
Stream of Packed Unsigned Bytes
The following code is an example of how to process a stream of
packed unsigned bytes (like RGBA information) with faster
3DNow! instructions.
Example:  
outside loop:
PXOR
MM0, MM0
inside loop:
MOVD
MM1, [VAR]
;
 0 | v[3],v[2],v[1],v[0]
PUNPCKLBW
MM1, MM0
;0,v[3],0,v[2] | 0,v[1],0,v[0]
MOVQ
MM2, MM1
;0,v[3],0,v[2] | 0,v[1],0,v[0]
PUNPCKLWD
MM1, MM0
;   0,0,0,v[1] | 0,0,0,v[0]
PUNPCKHWD
MM2, MM0
;   0,0,0,v[3] | 0,0,0,v[2]
PI2FD
MM1, MM1
;  float(v[1]) | float(v[0])
PI2FD
MM2, MM2
;  float(v[3]) | float(v[2])