AMD Typewriter x86 사용자 설명서
Stream of Packed Unsigned Bytes
125
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
The following code fragment uses the 3DNow! PAVGUSB
i n s tr u c t i o n t o p e r fo r m ave ra g i n g b e t we e n t h e s o u rc e
macroblock and destination macroblock:
i n s tr u c t i o n t o p e r fo r m ave ra g i n g b e t we e n t h e s o u rc e
macroblock and destination macroblock:
Example 2 (Preferred):
MOV
EAX, DWORD PTR Src_MB
MOV
EDI, DWORD PTR Dst_MB
MOV
EDX, DWORD PTR SrcStride
MOV
EBX, DWORD PTR DstStride
MOV
ECX, 16
L1:
MOVQ
MOVQ
MM0, [EAX]
;MM0=QWORD1
MOVQ
MM1, [EAX+8]
;MM1=QWORD2
PAVGUSB
MM0, [EDI]
;(QWORD1 + QWORD3)/2 with
; adjustment
; adjustment
PAVGUSB
MM1, [EDI+8]
;(QWORD2 + QWORD4)/2 with
; adjustment
; adjustment
ADD
EAX, EDX
MOVQ
[EDI], MM0
MOVQ
[EDI+8], MM1
ADD
EDI, EBX
LOOP
L1
Stream of Packed Unsigned Bytes
The following code is an example of how to process a stream of
packed unsigned bytes (like RGBA information) with faster
3DNow! instructions.
packed unsigned bytes (like RGBA information) with faster
3DNow! instructions.
Example:
outside loop:
PXOR
PXOR
MM0, MM0
inside loop:
MOVD
MOVD
MM1, [VAR]
;
0 | v[3],v[2],v[1],v[0]
PUNPCKLBW
MM1, MM0
;0,v[3],0,v[2] | 0,v[1],0,v[0]
MOVQ
MM2, MM1
;0,v[3],0,v[2] | 0,v[1],0,v[0]
PUNPCKLWD
MM1, MM0
; 0,0,0,v[1] | 0,0,0,v[0]
PUNPCKHWD
MM2, MM0
; 0,0,0,v[3] | 0,0,0,v[2]
PI2FD
MM1, MM1
; float(v[1]) | float(v[0])
PI2FD
MM2, MM2
; float(v[3]) | float(v[2])