Справочник Пользователя для AMD 250

Скачать
Страница из 384
Appendix A Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors
257
Software Optimization Guide for AMD64 Processors
25112
Rev. 3.06
September 2005
can execute out-of-order. In addition, a particular integer pipe can execute two micro-ops from 
different macro-ops (one in the ALU and one in the AGU) at the same time. See Figure 7 on 
page 256.
Each of the three ALUs performs general purpose logic functions, arithmetic functions, conditional 
functions, divide step functions, status flag multiplexing, and branch resolutions. The AGUs calculate 
the logical addresses for loads, stores, and LEAs. A load and store unit reads and writes data to and 
from the L1 data cache. The integer scheduler sends a completion status to the ICU when the 
outstanding micro-ops for a given macro-op are executed.
All integer operations can be handled within any of the three ALUs with the exception of multiplies. 
Multiplies are handled by a pipelined multiplier that is attached to the pipeline at pipe 0, as shown in 
Figure 7. Multiplies always issue to integer pipe 0, and the issue logic creates results bus bubbles for 
the multiplier in integer pipes 0 and 1 by preventing non-multiply micro-ops from issuing at the 
appropriate time.
A.13
Floating-Point Scheduler
The floating-point logic of the AMD Athlon 64 and AMD Opteron processors is a high-performance, 
fully pipelined, superscalar, out-of-order execution unit. It is capable of accepting three macro-ops 
per cycle from any mixture of the following types of instructions:
x87 floating-point
3DNow! technology
MMX technology
SSE
SSE2
The floating-point scheduler handles register renaming and has a dedicated 36-entry scheduler buffer 
organized as 12 lines of three macro-ops each. It also performs data superforwarding, micro-op issue, 
and out-of-order execution. The floating-point scheduler communicates with the ICU to retire a 
macro-op, to manage comparison results from the FCOMI instruction, and to back out results from a 
branch misprediction.
Superforwarding is a performance optimization. It allows a floating point operation having a 
dependency on a register to be scheduled sooner when that register is waiting to be filled by a pure 
load from memory. Instead of waiting for the first instruction to write its load-data to the register and 
then waiting for the second instruction to read it, the load-data can be provided directly to the 
dependent instruction, much like regular forwarding between FPU-only operations. The result from 
the load is said to be "superforwarded" to the floating-point operation. In the following example, the 
FADD can be scheduled to execute as soon as the load operation fetches its data rather than having to 
wait and read it out of the register file.