Friday, May 9, 2008

General architecture


Internally, the Athlon is a fully seventh generation x86 processor, the first of its kind. Like the AMD K5 and K6, the Athlon is a RISC microprocessor which decodes x86 instructions into its own internal instructions at runtime. The CPU is an out-of-order design, again like previous post-5x86 AMD CPUs. The Athlon utilizes the DEC Alpha EV6 bus architecture with double data rate (DDR) technology. This means that at 100 MHz the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus (referred to as 200 MT/s), which was superior to the method used on Intel's Pentium III (with SDR bus speeds of 100 and 133 MHz).

AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. Athlon's CISC to RISC decoder triplet could potentially decode 6 x86 operations per clock, although this was somewhat unlikely in real-world use.[3] The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was onboard the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained.[4] Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of going much higher.

AMD ended its long-time handicap with floating point x87 performance by designing an impressive super-pipelined, out-of-order, triple-issue floating point unit.[3] Each of its 3 units were tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once.[3] This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.[5]

The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to "Enhanced 3DNow!". Additions included DSP instructions and an implementation of the extended-MMX subset of Intel SSE.[6]

CPU Caching onboard Athlon consisted of the typical two levels. Athlon was the first x86 processor with a 128 KiB split level 1 cache; a 2-way associative, later 16-way, cache separated into 2×64 KiB for data and instructions (Harvard architecture).[3] This cache was double the size of K6's already large 2×32 KiB cache, and quadruple the size of Pentium II and III's 2×16 KiB L1 cache. The initial Athlon (Slot A, later renamed Athlon Classic) used 512 KiB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing processes was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.

No comments: