|
|
THE AMD ATHLON PROCESSOR
Last updated: 11/22/99
 |

|
Click
diagram to enlarge |
A quote from the AMD Athlon Processor Architecture:
"The
AMD Athlon features a superpipelined, nine-issue superscalar microarchitecture
optimized for high clock frequency. The AMD Athlon has a large dual-ported
128KB split-L1 cache (64KB instruction cache + 64KB data cache); a two-way,
2048-entry branch prediction table; multiple parallel x86 instruction decoders;
and multiple integer and floating point schedulers for independent superscalar,
out-of-order, speculative execution of instructions. These elements are
packed into an aggressive processing pipeline that includes 10-stage integer
and
15-stage floating point pipelines."
What
in the world does all of this mumble jumbo mean? Let's put this silicon
genie back into its bottle and try to describe it in brief and more understandable
language.
The Athlon is a seventh generation X86
processor. That means it will execute CPU instructions written
for the X86 series of CPU's and it is architecturally (and factually) a
more powerful chip than it's predecessors--there is more under the hood.
The first X86 generation was the Intel 8086
(and 8088) introduced circa 1978, followed by the 80286, 80386, 80486, "80586" (Pentium,
K5, etc.), "80686" (Pentium II, III, K6, K6-2, K6-3, etc.), and "80786" (Athlon). The
8086 had 29,000 transistors; the Pentium II has 7.5 million, and the Athlon
has 22 million. In short, the Athlon is the first seventh generation
X86 processor.
The Athlon has three X86 instruction decoders. An
instruction set is a processor's language. An instruction tells the
processor what data to operate on and what to do with it. An X86 instruction
varies in length from one to 15 bytes. A byte is eight bits; a bit
is logical one or zero. A logical one or zero is represented respectively
by two voltage levels or the two states of a transistorized electronic switch
(on or off like a light switch; a 1 or 0 in the binary number system, which,
in turn, can be used to represent characters, decimal numbers, instructions,
etc.). The decoders convert X86 instructions into fixed-length MacroOPs,
the language of the Athlon. In short, these decoders decode X86
instructions into Athlon instructions.
The Athlon has an Instruction Control Unit
(ICU). Up to three MacroOPs, Athlon instructions, are sent from
the decoders to the ICU per CPU cycle. The ICU buffers and manages
the MacroOPs and sends them the processor's execution unit schedulers. In
short, the ICU is a managed buffer between the decoders and schedulers.
The Athlon has two execution unit schedulers. There
are two MacroOP schedulers. The first one schedules integer and address
calculation MacroOPs. The second schedules MMX, 3DNow!, and X87 MacroOPs. In
short, the Athlon has two execution schedulers which manage the execution
pipelines.
The Athlon has nine independent execution
pipelines. There are three 10-stage integer, three address calculation,
and three 15-stage, MMX, 3DNow!, and X87 floating-point execution pipelines. The
last three essentially do the floating
point number crunching which used to be done by a separate
math coprocessor chip (X87) back in the days before the 80486 (and MMX
and 3DNow!) and account for a lot of the zip in graphics (games), spreadsheet
recalc's, etc. It is also where the fancy language ("...three-issue,
superscalar floating-point capability is based on three pipelined, out-of-order
floating-point execution units..") comes into play. Let's call
it "magic" and be done with it. In short, the Athlon
has nine execution pipelines which can simultaneously process data. Three
of them are independent floating point units (FPUs), which together can
deliver as many as four, 32-bit, single-precision floating-point results
in a single CPU clock cycle.
The Athlon has a sophisticated, dynamic
branch prediction logic. 'It has a two-way, 2048-entry branch
prediction table to store information used to predict the direction of
conditional branches. CALL/RET instruction pairs are optimized by storing
the return address of each CALL within a nested series of subroutines.
A return address is supplied as the predicted target address of the corresponding
RET instruction.' In short, the Athlon has advanced branch prediction
logic.
The Athlon implements Enhanced 3DNow!™
- '21 original 3DNow! instructions with superscalar SIMD
- 19 new instructions to enable improved integer math
calculations for speech or video encoding and improved data movement for
Internet plug-ins and other streaming applications
- 5 new DSP instructions to improve soft modem, soft ADSL,
Dolby Digital surround sound, and MP3 applications..'
In short, the Athlon can do video, 3D, sound, etc. better.
The Athlon has a high performance memory
Cache' architecture. L1 cache, closest the CPU innards is comprised
of two separate 64 KByte caches, one each for data and instructions. The
next layer of cache', L2, is 512 KBytes. The processor's cache' controller
can support up to eight MBytes of external (L3) cache' on the motherboard. In
short, the Athlon has 128 Kbytes of L1 cache' and 512 KBytes of L2 cache'
built-in.
Next - System
Bus > |
|