Mercury 7410 User Manual

Mercury can configure systems with hundreds of compute

nodes, communicating over the second-generation RACE++
switch fabric interconnect. Merging RACE++ and AltiVec
technology provides embedded computers with unprec-
edented computational power.

AltiVec Vector Processing Unit

The AltiVec vector processing unit operates on 128 bits of

data concurrently with the other PowerPC execution units.
AltiVec instructions may be interleaved with other PowerPC
instructions without any penalty such as a context switch. The
128-bit wide execution unit can be used to operate on four
floating-point numbers, four 32-bit integers, eight 16-bit
integers, or sixteen 8-bit integers simultaneously.

AltiVec instructions are carried out by one of two AltiVec

sub-units. The Vector arithmetic logic unit handles the
vector fixed-point and vector floating-point operations. Two
floating-point operations are possible in a single cycle with the
vector multiply-add instruction and the vector negative
multiply-subtract instruction.

The Permute sub-unit incorporates a crossbar network to

perform 16 individual byte moves in a single cycle. This
capability can be used for simple tasks such as converting the
"endian-ness" of data or for more complicated tasks such as
byte interleaving, dynamic address alignment, or accelerating
small look-up tables.

PowerPC RISC Architecture

In addition to the AltiVec execution unit, the MPC7410

contains a floating-point unit and two integer units that can
operate concurrently with the AltiVec unit. Data and instruc-
tions are fed through two on-chip, 32-Kbyte, eight-way
set-associative caches that enhance performance of both
vector and scalar code.

Each PowerPC 7410 CN also includes a fully pipelined

backside L2 cache operating at 250 MHz. This high-

performance cache system provides quick access to data
previously loaded from memory but too large to fit into the
on-chip cache.

Compute Node ASIC

The CN ASIC, included in each compute node, acts as both

a memory controller and as a network interface to the
RACE++ switch fabric interconnect. The CN ASIC includes
an enhanced DMA controller, a high-performance memory
system with error checking and correcting, metering logic,
and a RACE++ interface. By combining memory control
and network interface into a single chip, Mercury's compute
node provides the highest performance with the lowest power
consumption and highest reliability.

High-Performance Memory System

Mercury's high-performance memory subsystem allows the

memory to reach the intrinsic limits of its performance
capability with:

125-MHz Synchronous DRAM
Prefetch Buffers: bring sequential data to the ASIC ahead
of their explicit requests by the processor. These prefetch
buffers greatly improve the performance of the CN in vec-
tor operations such as those used in DSP applications.
FIFO Buffers: efficiently overlap accesses to SDRAM from
the local processor and the RACEway interconnect.

The PowerPC CN contains error-correcting circuitry for

improved data integrity. One-bit errors are corrected on the
fly, and multi-bit errors generate an interrupt error condition.

Enhanced DMA Controller

Each CN has an advanced DMA controller to support

RACEway transfers at 267 MB/s with chaining and striding.

MPC7410 Data

and Instruction Flow

Compute Node ASIC Architecture