Mercury 7410 User Manual

Page of 4
Mercury can configure systems with hundreds of compute
nodes, communicating over the second-generation RACE++
switch fabric interconnect. Merging RACE++ and AltiVec
technology provides embedded computers with unprec-
edented computational power.
AltiVec Vector Processing Unit
The AltiVec vector processing unit operates on 128 bits of
data concurrently with the other PowerPC execution units.
AltiVec instructions may be interleaved with other PowerPC
instructions without any penalty such as a context switch. The
128-bit wide execution unit can be used to operate on four
floating-point numbers, four 32-bit integers, eight 16-bit 
integers, or sixteen 8-bit integers simultaneously.
AltiVec instructions are carried out by one of two AltiVec
sub-units. The Vector arithmetic logic unit handles the 
vector fixed-point and vector floating-point operations. Two
floating-point operations are possible in a single cycle with the
vector multiply-add instruction and the vector negative 
multiply-subtract instruction.
The Permute sub-unit incorporates a crossbar network to
perform 16 individual byte moves in a single cycle. This 
capability can be used for simple tasks such as converting the
"endian-ness" of data or for more complicated tasks such as
byte interleaving, dynamic address alignment, or accelerating
small look-up tables.
PowerPC RISC Architecture
In addition to the AltiVec execution unit, the MPC7410 
contains a floating-point unit and two integer units that can
operate concurrently with the AltiVec unit. Data and instruc-
tions are fed through two on-chip, 32-Kbyte, eight-way 
set-associative caches that enhance performance of both 
vector and scalar code.
Each PowerPC 7410 CN also includes a fully pipelined 
backside L2 cache operating at 250 MHz. This high-
performance cache system provides quick access to data 
previously loaded from memory but too large to fit into the
on-chip cache.
Compute Node ASIC
The CN ASIC, included in each compute node, acts as both
a memory controller and as a network interface to the
RACE++ switch fabric interconnect. The CN ASIC includes
an enhanced DMA controller, a high-performance memory
system with error checking and correcting, metering logic,
and a RACE++ interface. By combining memory control
and network interface into a single chip, Mercury's compute
node provides the highest performance with the lowest power
consumption and highest reliability.
High-Performance Memory System
Mercury's high-performance memory subsystem allows the
memory to reach the intrinsic limits of its performance 
capability with:
125-MHz Synchronous DRAM
Prefetch Buffers
: bring sequential data to the ASIC ahead
of their explicit requests by the processor. These prefetch
buffers greatly improve the performance of the CN in vec-
tor operations such as those used in DSP applications.
FIFO Buffers: efficiently overlap accesses to SDRAM from
the local processor and the RACEway interconnect.
The PowerPC CN contains error-correcting circuitry for
improved data integrity. One-bit errors are corrected on the
fly, and multi-bit errors generate an interrupt error condition.
Enhanced DMA Controller
Each CN has an advanced DMA controller to support
RACEway transfers at 267 MB/s with chaining and striding.
MPC7410 Data 
and Instruction Flow
Compute Node ASIC Architecture