AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
126
Complex Number Arithmetic
AMD Athlon™ Processor x86 Code Optimization 
22007E/0—November 1999
Complex Number Arithmetic
Complex numbers have a “real” part and an “imaginary” part.
Multiplying complex numbers (ex. 3 + 4i) is an integral part of
many algorithms such as Discrete Fourier Transform (DFT) and
complex FIR filters. Complex number multiplication is shown
below:
(src0.real + src0.imag) * (src1.real + src1.imag) = result
result = (result.real + result.imag)
result.real <= src0.real*src1.real - src0.imag*src1.imag
result.imag <= src0.real*src1.imag + src0.imag*src1.real
Example:  
(1+2i) * (3+4i) => result.real + result.imag
result.real <= 1*3 - 2*4 = -5
result.imag <= 1*4i + 2i*3 = 10i
result = -5 +10i
Assuming that complex numbers are represented as two
element vectors [v.real, v.imag], one can see the need for
swapping the elements of src1 to perform the multiplies for
result.imag, and the need for a mixed positive/negative
accumulation  to  complete  the  pa rallel computation  of
result.real and result.imag.
PSWAPD performs the swapping of elements for src1 and
PFPNACC performs the mixed positive/negative accumulation
to complete the computation. The code example below
summarizes the computation of a complex number multiply. 
Example:  
;MM0 = s0.imag | s0.real
;reg_hi | reg_lo
;MM1 = s1.imag | s1.real
PSWAPD
MM2, MM0
;M2 =         s0.real | s0.imag
PFMUL
MM0, MM1
;M0 = s0.imag*s1.imag |s0.real*s1.real
PFMUL
MM1, MM2
;M1 = s0.real*s1.imag | s0.imag*s1.real
PFPNACC
MM0, MM1
;M0 =        res.imag | res.real
PSWAPD supports independent source and result operands and
enables PSWAPD to also perform a copy function. In the above
example, this eliminates the need for a separate “MOVQ MM2,
MM0” instruction.