AMD Typewriter x86 사용자 설명서

다운로드
페이지 256
Accelerating Floating-Point Divides and Square Roots
29
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization 
quadword alignment), so that quadword operands might be
misaligned, even if this technique is used and the compiler does
allocate variables in the order they are declared.
The following example demonstrates the reordering of local
variable declarations:
Original ordering (Avoid):  
short   ga, gu, gi;
long    foo, bar;
double  x, y, z[3];
char    a, b;
float   baz;
Improved ordering (Preferred):  
double  z[3];
double  x, y;
long    foo, bar;
float   baz;
short   ga, gu, gi;
Accelerating Floating-Point Divides and Square Roots
Divides and square roots have a much longer latency than other
floating-point operations, even though the AMD Athlon
processor provides significant acceleration of these two
operations. In some codes, these operations occur so often as to
s e r i o u s l y   i m p a c t   p e r f o r m a n c e .   I n   t h e s e   c a s e s ,   i t   i s
recommended to port the code to 3DNow! inline assembly or to
use a compiler that can generate 3DNow! code. If code has hot
spots that use single-precision arithmetic only (i.e., all
computation involves data of type float) and for some reason
cannot be ported to 3DNow!, the following technique may be
used to improve performance.
The x87 FPU has a precision-control field as part of the FPU
control word. The precision-control setting determines what
precision results get rounded to. It affects the basic arithmetic
operations, including divides and square roots. AMD Athlon
and AMD-K6
®
 family processors implement divide and square
root in such fashion as to only compute the number of bits