Go to Home Page GuidesHow to ArticlesReviewsForumsFrequently Asked QuestionsNewsLinksPotpourri

Site Search


Last updated: 11/22/99

Click diagram to enlarge

A quote from the AMD Athlon Processor Architecture:

"The AMD Athlon features a superpipelined, nine-issue superscalar microarchitecture optimized for high clock frequency. The AMD Athlon has a large dual-ported 128KB split-L1 cache (64KB instruction cache + 64KB data cache); a two-way, 2048-entry branch prediction table; multiple parallel x86 instruction decoders; and multiple integer and floating point schedulers for independent superscalar, out-of-order, speculative execution of instructions. These elements are packed into an aggressive processing pipeline that includes 10-stage integer and
15-stage floating point pipelines."

What in the world does all of this mumble jumbo mean?  Let's put this silicon genie back into its bottle and try to describe it in brief and more understandable language.

The Athlon is a seventh generation X86 processor.  That means it will execute CPU instructions written for the X86 series of CPU's and it is architecturally (and factually) a more powerful chip than it's predecessors--there is more under the hood.

The first X86 generation was the Intel 8086 (and 8088) introduced circa 1978, followed by the 80286, 80386, 80486, "80586" (Pentium, K5, etc.), "80686" (Pentium II, III, K6, K6-2, K6-3, etc.), and "80786" (Athlon).  The 8086 had 29,000 transistors; the Pentium II has 7.5 million, and the Athlon has 22 million.  In short, the Athlon is the first seventh generation X86 processor.

The Athlon has three X86 instruction decoders.  An instruction set is a processor's language.  An instruction tells the processor what data to operate on and what to do with it.  An X86 instruction varies in length from one to 15 bytes.  A byte is eight bits; a bit is logical one or zero.  A logical one or zero is represented respectively by two voltage levels or the two states of a transistorized electronic switch (on or off like a light switch; a 1 or 0 in the binary number system, which, in turn, can be used to represent characters, decimal numbers, instructions, etc.).  The decoders convert X86 instructions into fixed-length MacroOPs, the language of the Athlon.  In short, these decoders decode X86 instructions into Athlon instructions.

The Athlon has an Instruction Control Unit (ICU).  Up to three MacroOPs, Athlon instructions, are sent from the decoders to the ICU per CPU cycle.  The ICU buffers and manages the MacroOPs and sends them the processor's execution unit schedulers.  In short, the ICU is a managed buffer between the decoders and schedulers.

The Athlon has two execution unit schedulers.   There are two MacroOP schedulers.  The first one schedules integer and address calculation MacroOPs. The second schedules MMX, 3DNow!, and X87 MacroOPs.  In short, the Athlon has two execution schedulers which manage the execution pipelines.

The Athlon has nine independent execution pipelines.  There are three 10-stage integer, three address calculation, and three 15-stage, MMX, 3DNow!, and X87 floating-point execution pipelines.  The last three essentially do the floating point number crunching which used to be done by a separate math coprocessor chip (X87) back in the days before the 80486 (and MMX and 3DNow!) and account for a lot of the zip in graphics (games), spreadsheet recalc's, etc.   It is also where the fancy language ("...three-issue, superscalar floating-point capability is based on three pipelined, out-of-order floating-point execution units..") comes into play.  Let's call it "magic" and be done with it. In short, the Athlon has nine execution pipelines which can simultaneously process data.  Three of them are independent floating point units (FPUs), which together can deliver as many as four, 32-bit, single-precision floating-point results in a single CPU clock cycle.

The Athlon has a sophisticated, dynamic branch prediction logic.  'It has a two-way, 2048-entry branch prediction table to store information used to predict the direction of conditional branches. CALL/RET instruction pairs are optimized by storing the return address of each CALL within a nested series of subroutines. A return address is supplied as the predicted target address of the corresponding RET instruction.'  In short, the Athlon has advanced branch prediction logic.

The Athlon implements Enhanced 3DNow!™ 

  • '21 original 3DNow! instructions with superscalar SIMD
  • 19 new instructions to enable improved integer math calculations for speech or video encoding and improved data movement for Internet plug-ins and other streaming applications
  • 5 new DSP instructions to improve soft modem, soft ADSL, Dolby Digital surround sound, and MP3 applications..'

In short, the Athlon can do video, 3D, sound, etc. better.

The Athlon has a high performance memory Cache' architecture.  L1 cache, closest the CPU innards is comprised of two separate 64 KByte caches, one each for data and instructions.  The next layer of cache', L2, is 512 KBytes.  The processor's cache' controller can support up to eight MBytes of external (L3) cache' on the motherboard.  In short, the Athlon has 128 Kbytes of L1 cache' and 512 KBytes of L2 cache' built-in.

Next -  System Bus >

Copyright, Disclaimer, and Trademark Information Copyright © 1996-2006 Larry F. Byard.  All rights reserved. This material or parts thereof may not be copied, published, put on the Internet, rewritten, or redistributed without explicit, written permission from the author.