Friday, May 9, 2008

Future Turion Ultra


The Turion Ultra (codenamed Griffin) is the first processor family from AMD solely for the mobile platform, based on the Athlon 64 (K8) architecture with some specific architectural enhancements similar to upcoming Opteron processors aimed at lower power consumption and longer battery life.

Features:

The Turion Ultra is a dual-core processor to be fabricated on 65 nm technology using 300 mm SOI wafers. It will support DDR2-800 SO-DIMM's and features a DRAM prefetcher to improve performance and a mobile-enhanced northbridge (memory controller, HyperTransport controller, and crossbar switch). Each processor core comes with 1 MiB L2 cache for a total of 2 MiB L2 cache for the entire processor. This is double the L2 cache found on the current Turion 64 X2 processor. Clock rates will range from 2.0 GHz to 2.4 GHz, and total design power (TDP) will range from 32 watts to 35 watts.[1]

An outstanding feature of the Turion Ultra processor is that it implements three voltage planes: one for the northbridge and one for each core.[2] This, along with multiple phase-locked loops (PLL), allows one core to alter its voltage and operating frequency independently of the other core, and independently of the northbridge. Indeed, in a matter of microseconds, the processor can switch to one of 8 frequency levels and one of 5 voltage levels. By adjusting frequency and voltage during use, the processor can adapt to different workloads and help reduce power consumption. It can operate as low as 250 MHz to conserve power during light use.

Additionally, the processor features deep sleep state C3, deeper sleep state C4 (AltVID), and HyperTransport 3.0 up to 2.6 GHz, or up to 41.6 GB/s bandwidth per link at 16-bit link width and dynamic scaling of HT link width down to 0-bit ("disconnected") in both directions from and to the chipset for four different usage scenarios [3]. It also implements multiple on-die thermal sensors through integrated SMBUS (SB-TSI) interface (replaces and eliminates the thermal monitor circuit chip through SMBUS in its predecessors) with additional MEMHOT signal sent from embedded controller to the processor, and reduces memory temperature.

The Turion Ultra processor will share the same socket S1 as its predecessor (Turion 64 X2) but will not have the same pinout.[4] It is designed to work with the RS780M chipset.

It is worth noting that given the above enhancements on the architecture, the cores were minimally modified and are based on the K8 instead of the K10 microarchitecture.[4] AMD Fellow Maurice Steinman has said the cores are almost transistor-for-transistor identical to those found in the 65 nm Turion 64 X2 processors[citation needed]. This makes it more likely that Turion Ultra will avoid the clock rate scaling difficulties present in AMD's K10 products.

Availability:

The Turion Ultra processor is still under development and is expected to be released as part of the "Puma" mobile platform in the second quarter of 2008. AMD executives are happy to divulge that the Puma platform has raked in over 100 design wins already, more than any other new product in AMD history.

Future products

The Athlon 64 line is expected to continue to evolve. In particular, new models scheduled to be launched in the third quarter of 2007 are to be based on the "K10" microarchitecture. The initial offerings are expected to be based on the Agena (quad-core, 2 MiB L3 cache), Agena FX and Kuma (dual-core, 2 MiB L3 cache) cores. These processors will be packaged in the Socket F+ (Agena FX) and Socket AM2+ form factors, but are expected to function in Socket AM2 motherboards as well, with the loss of HyperTransport 3.0 enhancements, which will only be available with Socket AM2+ motherboards. Processor model information has been reported as follows:[60]

According to the most recent report, the range of K10 desktop microprocessors will no longer use the trademark "Athlon", but will don the new name "Phenom". Subsequent models will use the name "Phenom X2" for dual core variants, "Phenom X4" for quad core variants and "Phenom FX" to replace the current Athlon 64 FX.


Main article: AMD K10

Athlon 64 models

Clawhammer (130 nm SOI)

Newcastle (130 nm SOI)

Also possible: ClawHammer-512 (Clawhammer with partially disabled L2-Cache)

Winchester (90 nm SOI)

Venice (90 nm SOI)

San Diego (90 nm SOI)

Orleans (90 nm SOI)

Lima (65 nm SOI)

Athlon 64 FX models

Sledgehammer (130 nm SOI)

  • CPU-Stepping: C0, CG
  • L1-Cache: 64 + 64 KiB (Data + Instructions)
  • L2-Cache: 1024 KiB, fullspeed
  • MMX, Extended 3DNow!, SSE, SSE2, AMD64
  • Socket 940, 800 MHz HyperTransport (HT800)
  • Registered DDR-SDRAM required
  • VCore: 1.50/1.55 V
  • Power Consumption (TDP): 89 Watt max
  • First Release: September 23, 2003
  • Clockrate: 2200 MHz (FX-51, C0), 2400 MHz (FX-53, C0 and CG)

Clawhammer (130 nm SOI)

  • CPU-Stepping: CG
  • L1-Cache: 64 + 64 KiB (Data + Instructions)
  • L2-Cache: 1024 KiB, fullspeed
  • MMX, Extended 3DNow!, SSE, SSE2, AMD64
  • Socket 939, 1000 MHz HyperTransport (HT1000)
  • VCore: 1.50 V
  • Power Consumption (TDP): 89 Watt (FX-55:104 Watt)
  • First Release: June 1, 2004
  • Clockrate: 2400 MHz (FX-53), 2600 MHz (FX-55)

San Diego (90 nm SOI)

Toledo (90 nm SOI)

Dual-core CPU

Windsor (90 nm SOI)

Dual-core CPU

Windsor (90 nm SOI) - Quad FX platform

Main article: AMD Quad FX platform

Dual-core, dual CPUs (four cores total)

Mobile Athlon 64

A line for mobile computing.

Sockets


At the introduction of Athlon 64 in September 2003, only Socket 754 and Socket 940 (Opteron) were ready and available. The onboard memory controller was not capable of running unbuffered (non-registered) memory in dual-channel mode at the time of release; as a stopgap measure, they introduced the Athlon 64 on Socket 754, and brought out a non-multiprocessor version of the Opteron called the Athlon 64 FX, as a multiplier unlocked enthusiast part for Socket 940, comparable to Intel's Pentium 4 Extreme Edition for the high end market.

In June 2004, AMD released Socket 939 as the mainstream Athlon 64 with dual-channel memory interface, leaving Socket 940 solely for the server market (Opterons), and relegating Socket 754 as a value/budget line, for Semprons and slower versions of the Athlon 64. Eventually Socket 754 replaced Socket A for Semprons.

In May 2006, AMD released Socket AM2, which provided support for the DDR2 memory interface. Also, this marked the release of AMD's Virtualization technology.

In August 2006, AMD released Socket F for Opteron server CPU which uses the LGA chip form factor.

In November 2006, AMD released a specialized version of Socket F, called 1207 FX, for dual-socket, dual-core Athlon FX processors on the Quad FX platform. While Socket F Opterons already allowed for four processor cores, Quad FX allowed unbuffered RAM and expanded CPU/chipset configuration in the BIOS. Consequentially, Socket F and F 1207 FX are incompatible and require different processors, chipsets, and motherboards.

Athlon 64 X2

The Athlon 64 X2 is the first dual-core desktop CPU manufactured by AMD. It is essentially a processor consisting of two Athlon 64 cores joined together on one die with additional control logic. The cores share one dual-channel memory controller, are based on the E-stepping model of Athlon 64 and, depending on the model, have either 512 or 1024 KiB of L2 Cache per core. The Athlon 64 X2 is capable of decoding SSE3 instructions (except those few specific to Intel's architecture), so it can run and benefit from software optimizations that were previously only supported by Intel chips. This enhancement is not unique to the X2, and is also available in the Venice and San Diego single core Athlon 64s.

In June 2007, AMD released low-voltage variants of their low-end 65 nm Athlon 64 X2, named "Athlon X2".[1] The Athlon X2 processors feature reduced TDP of 45 W.[2]


The main benefit of dual-core processors like the X2 is their ability to process more software threads at the same time. The ability of processors to execute multiple threads simultaneously is called thread-level parallelism (TLP). By placing two cores on the same die, the X2 effectively doubles the TLP over a single-core Athlon 64 of the same speed. The need for TLP processing capability is dependent on situation to a great degree, and certain situations benefit from it far more than others. Certain programs are currently only written with one thread, and are therefore unable to utilize the processing power of the second core.

Programs often written with multiple threads and capable of utilizing dual-cores include many music and video encoding applications, and especially professional rendering programs. High TLP applications currently correspond to server/workstation situations more than the typical desktop. These applications can realize almost twice the performance of a single-core Athlon 64 of the same specifications. Multi-tasking also runs a sizable number of threads; intense multi-tasking scenarios have actually shown improvements of considerably more than two times [2]. This is primarily due to the excessive overhead caused by constantly switching threads, and could potentially be improved by adjustments to operating system scheduling code.

In the consumer segment of the market as well, the X2 improves upon the performance of the original Athlon 64, especially for multi-threaded software applications. The overall increase in performance of the entry level Athlon 64 X2 chip (the Athlon 64 X2 3800+) over the single-core Athlon 64 3800+ chip is almost 10%. The spread between the latter and the Athlon 64 X2 5000+ is almost 40% [3]. One can interpret from these numbers that the majority of applications (at least in the benchmark test) are still largely single thread-dominated, hence the absence of a larger gap between the two 3800+ processors. As software programmers begin to take advantage of multi-core processing, the spread between single- and multi-core processors will increase.


Manufacturing costs:


Having two cores, the Athlon 64 X2 has an increased number of transistors. The 1-MiB-L2-cache 90 nm Athlon 64 X2 processor is 219 mm² in size with 243 million transistors [3] whereas its 1-MiB-L2-cache 90 nm Athlon 64 counterpart is 103.1 mm² and has 164 million transistors [4]. The 65 nm Athlon 64 X2 with only 512 KiB L2 per Core reduced this to 118 mm² with 221 million transistors compared to the 65 nm Athlon 64 with 77.2 mm² and 122 million transistors. As a result, a larger area of silicon must be defect free. These size requirements necessitate a more complex fabrication process, which further adds to the production of fewer functional processors per single silicon wafer. This lower yield makes the X2 more expensive to produce than the single-core processor.

In the middle of June 2006 AMD stated that they would no longer make any non-FX Athlon 64 or Athlon 64 X2 models with 1-MiB L2 caches [4]. This led to only a small production number of the Socket-AM2 Athlon 64 X2 with 1 MiB L2 cache per core, known as 4000+, 4400+, 4800+, and 5200+. The Athlon 64 X2 with 512 KiB per core, known as 3800+, 4200+, 4600+, and 5000+, were produced in far greater numbers. The introduction of the F3 stepping then saw several models with 1 MiB L2 cache per core as production refinements resulted in an increased yield.

Athlon 64 FX

The Athlon 64 FX is positioned as a hardware enthusiast product, marketed by AMD especially toward gamers.[55] Unlike the standard Athlon 64, all of the Athlon 64 FX processors have their multipliers completely unlocked.[56] The FX line is now dual-core, starting with the FX-60.[57] The FX always has the highest clock speed of all Athlons at its release.[58] From FX-70 onwards, the line of processors will also support dual-processor setup with NUMA, named AMD Quad FX platform.

Athlon 64 features

There are four variants: Athlon 64, Athlon 64 FX, Mobile Athlon 64 (later renamed "Turion 64") and the dual-core Athlon 64 X2.[39] Common among the Athlon 64 line are a variety of instruction sets including MMX, 3DNow!, SSE, SSE2, and SSE3.[40] All Athlon 64s also support the NX bit, a security feature named "Enhanced Virus Protection" by AMD.[41] And as implementations of the AMD64 architecture, all Athlon 64 variants are able to run 16 bit, 32 bit x86, and AMD64 code, through two different modes the processor can run in: "Legacy mode" and "long mode". Legacy mode runs 16-bit and 32-bit programs natively, and long mode runs 64-bit programs natively, but also allows for 32-bit programs running inside a 64-bit operating system.[42] All Athlon 64 processors feature 128 kibibytes of level 1 cache, and at least 512 kibibytes of level 2 cache.[40]

The Athlon 64 features an on-die memory controller,[5] a feature not previously seen on x86 CPUs. Not only does this mean the controller runs at the same clock rate as the CPU itself, it also means the electrical signals have a shorter physical distance to travel compared to the old northbridge interfaces.[43] The result is a significant reduction in latency (response time) for access requests to main memory.[44] The lower latency is often cited as one of the advantages of the Athlon 64's architecture over those of its competitors.[45]

Translation Lookaside Buffers (TLBs) have also been enlarged (40 4k/2M/4M entries in L1 cache, 512 4k entries),[46] with reduced latencies and improved branch prediction, with four times the number of bimodal counters in the global history counter.[42] This and other architectural enhancements, especially as regards SSE implementation, improve instruction per cycle (IPC) performance over the previous Athlon XP generation.[42] To make this easier for consumers to understand, AMD has chosen to market the Athlon 64 using a PR (Performance Rating) system, where the numbers roughly map to Pentium 4 performance equivalents, rather than actual clock speed.[47]

Athlon 64 also features CPU speed throttling technology branded Cool'n'Quiet, a feature similar to Intel's SpeedStep that can throttle the processor's clock speed back to facilitate lower power consumption and heat production.[48] When the user is running undemanding applications and the load on the processor is light, the processor's clock speed and voltage are reduced. This in turn reduces its peak power consumption (max TDP set at 89 W by AMD) to as low as 32 W (stepping C0, clock speed reduced to 800 MHz) or 22W (stepping CG, clock speed reduced to 1 GHz). The Athlon 64 also has an Integrated Heat Spreader (IHS) which prevents the CPU core from accidentally being damaged when mounting and unmounting cooling solutions. With prior AMD CPUs a CPU shim could be used by people worried about damaging the core.

The No Execute bit (NX bit) supported by Windows Vista, Windows XP Service Pack 2,[49] Windows XP Professional x64 Edition, Windows Server 2003 x64 Edition, and Linux 2.6.8 and higher is also included, for improved protection from malicious buffer overflow security threats. Hardware-set permission levels make it much more difficult for malicious code to take control of the system. It is intended to make 64-bit computing a more secure environment.

The Athlon 64 CPUs have been produced with 130 nm and 90 nm SOI process technologies.[50] All of the latest chips (Winchester, Venice and San Diego models) are on 90 nm. The Venice and San Diego models also incorporate dual stress liner technology[51] (an amalgam of strained silicon and 'squeezed silicon', the latter of which is not actually a technology) co-developed with IBM.[52]

As the memory controller is integrated onto the CPU die, there is no FSB for the system memory to base its speed upon.[53] Instead, system memory speed is obtained by using the following formula (using the ceiling function):[54]

\frac{\mathrm{CPU~speed}}{\left\lceil\frac{\mathrm{CPU~multiplier}}{\mathrm{DRAM~divider}}\right\rceil}=\mathrm{DRAM~speed}

In simpler terms, the memory is always running at a set fraction of the CPU speed, with the divisor being a whole number. A 'FSB' figure is still used to determine the CPU speed, but the RAM speed is no longer directly related to this 'FSB' figure (known otherwise as the LDT).

To summarize, the Athlon 64 architecture features two buses from the CPU. One is the HT bus to the northbridge connecting the CPU to the chipset and device attachment bus (PCIe, AGP, PCI) and the other is the memory bus which connects the on-board memory controller to the bank of either DDR or DDR2 DRAM.

DDR2

The Athlon 64 had been maligned by some critics for some time because of its lack of support for DDR2 SDRAM, an emerging technology that had been adopted much earlier by Intel.[32] AMD's official position was that the CAS latency on DDR2 had not progressed to a point where it would be advantageous for the consumer to adopt it.[33] AMD finally remedied this gap with the "Orleans" core revision, the first Athlon 64 to fit Socket AM2, released on May 23, 2006.[34] "Windsor", an Athlon 64 X2 revision for Socket AM2, was released concurrently. Both Orleans and Windsor have either 512KiB or 1MiB of L2 cache per core.[35] The Athlon 64 FX-62 was also released concurrently on the Socket AM2 platform.[36] Socket AM2 also consumes less power than previous platforms, and supports AMD's virtualization technology.[37]

The memory controller used in all DDR2 SDRAM capable processors (Socket AM2), has extended column address range of 11 columns instead of conventional 10 columns, and the support of 16 kb page size, with at most 2048 individual entries supported. An OCZ unbuffered DDR2 kit, optimized for 64-bit operating systems, was released to exploit the functionality provided by the memory controller in socket AM2 processors, allowing the memory controller to stay longer on the same page, thus benefitting graphics intensive applications.[38]

Dual-core Athlon 64

On April 21, 2005, less than a week after the release of Venice and San Diego, AMD announced its next addition to the Athlon 64 line, the Athlon 64 X2.[23] Released on May 31, 2005,[24] it also initially had two different core revisions available to the public, Manchester and Toledo, the only appreciable difference between them being the amount of L2 cache.[25] Both were released only for Socket 939.[26] A response to Intel's dual core Pentium D, the Athlon 64 X2 was received very well by reviewers and the general public, with a general consensus emerging that AMD's implementation of multi-core was superior to that of the Pentium D.[27][28] Some felt initially that the X2 would cause market confusion with regard to price points since the new processor was targeted at the same "enthusiast," US$350 and above market[29] already occupied by AMD's existing socket 939 Athlon 64s.[30] AMD's official breakdown of the chips placed the Athlon X2 aimed at a segment they called the "prosumer", along with digital media fans.[24] The Athlon 64 was targeted at the mainstream consumer, and the Athlon FX at gamers. The Sempron budget processor was targeted at value-conscious consumers.[31]

History of Athlon 64


All of the 64-bit processors sold by AMD so far have their genesis in the K8 or Hammer project.

The Athlon 64 was originally codenamed ClawHammer by AMD,[3] and was referred to as such internally and in press releases. The first Athlon 64 FX was based on the first Opteron core, SledgeHammer. Both cores, produced on a 130 nanometer process, were first introduced on September 23, 2003. The models first available were the FX-51, fitting Socket 940, and the 3200+, fitting Socket 754.[6] Like the Opteron it was based on, the Athlon FX-51 required buffered RAM, increasing the final cost of an upgrade.[7] The week of the Athlon 64's launch, Intel released the Pentium 4 Extreme Edition, a CPU designed to compete with the Athlon 64 FX.[8] The Extreme Edition was widely considered a marketing ploy to draw publicity away from AMD, and was quickly nicknamed among some circles the "Emergency Edition".[9] Despite a very strong demand for the chip, AMD was plagued by early manufacturing difficulties that made it difficult to deliver Athlon 64s in quantity. In the early months of the Athlon 64 lifespan, AMD could only produce one hundred thousand chips per month.[10] However, it was very competitive in terms of performance to the Pentium 4, with magazine PC World calling it the "fastest yet".[11] "Newcastle" was released soon after ClawHammer, with half the Level 2 cache.[12]

On June 1, 2004, AMD released new versions of both the ClawHammer and Newcastle core revisions for the newly-introduced Socket 939, an altered Socket 940 without the need for buffered memory.[13] Socket 939 offered two main improvements over Socket 754: the memory controller was altered with dual-channel architecture,[14] doubling peak memory bandwidth, and the HyperTransport bus was increased in speed from 800 MHz to 1000 MHz.[15] Socket 939 also was introduced in the FX series in the form of the FX-55.[16] At the same time, AMD also began to ship the "Winchester" core, based on a 90 nanometer process.

Core revisions "Venice" and "San Diego" succeeded all previous revisions on April 15, 2005. Venice, the lower-end part, was produced for both Sockets 754 and 939, and included 512 KiB of L2 cache.[17] San Diego, the higher-end chip, was produced only for Socket 939 and doubled Venice's L2 cache to one MiB.[18] Both were produced on the 90 nm fabrication process.[19] Both also included support for the SSE3 instruction set,[20] a new feature that had been included in the rival Pentium 4 since the release of the Prescott core in February 2004.[21] In addition, AMD overhauled the memory controller for this revision, resulting in performance improvements as well as support for newer DDR RAM.[22]

Athlon 64

The Athlon 64 is an eighth-generation, AMD64 architecture microprocessor produced by AMD, released on September 23, 2003.[1] It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP.[2] The second processor (after the Opteron) to implement AMD64 architecture and the first 64-bit processor targeted at the average consumer,[3] it is AMD's primary consumer microprocessor, and competes primarily with Intel's Pentium 4, especially the "Prescott" and "Cedar Mill" core revisions. It is AMD's first K8, eighth-generation processor core for desktop and mobile computers.[4] Despite being natively 64-bit, the AMD64 architecture is backward-compatible with 32-bit x86 instructions.[5] Athlon 64s have been produced for Socket 754, Socket 939, Socket 940, and Socket AM2.

Athlon competitors

Mobile Athlon XP

Mobile Athlon XPs (Athlon XP-M) are identical to normal Athlon XPs, apart from running at lower voltages, often lower bus speeds, and not being multiplier-locked. The lower Vcore rating caused the CPU to have lower power consumption (ideal for battery-powered laptops) and lower heat production. Athlon XP-M CPUs also have a higher-rated heat tolerance, a requirement of the tight conditions within a notebook PC.

The Athlon XP-M replaced the older Mobile Athlon 4. The Mobile Athlon 4 used the older Palomino core, while the Athlon XP-M used the newer Thoroughbred and Barton cores. Some specialized low-power Athlon XP-Ms utilize the microPGA socket 563 rather than the standard Socket A.

The CPUs, like their mobile K6+ predecessors, were also capable of dynamic clock adjustment for power optimization. When the system is idle, the CPU clocks itself down through a lower bus multiplier and also reduces its voltage. Then, when a program demands more computational resources, the CPU very quickly (there is some latency) returns to intermediate or maximum speed to meet the demand. This technology was marketed as "PowerNow!". It was similar to Intel's SpeedStep power saving technique. The feature was controlled by the CPU, motherboard BIOS, and operating system. AMD later renamed the technology to Cool'n'Quiet, on their K8-based CPUs (Athlon 64, etc), and re-imagined it for use on desktop PCs as well.

Athlon XP-Ms were popular with desktop overclockers, as well as underclockers. The lower voltage requirement and higher heat rating resulted in CPUs that were basically "cherry picked" from the manufacturing line. Being the best of the cores off the line, the CPUs typically were more reliably overclocked than their desktop-headed counterparts. Also, the fact that they weren't locked to a single multiplier was a significant simplification for the overclocking process. Some Barton core Athlon XP-Ms have been successfully overclocked to as high as 3.1 GHz.

As stated, the chips were also liked for their underclocking ability. Underclocking is a process of determining the lowest Vcore at which a CPU can remain stable at for a given clock speed. The Athlon XP-M CPUs were capable of running lower voltages per clock rate compared to their desktop siblings. As such, the chips were used in home theater PC systems due to their high performance and low heat output at low Vcore settings.

Barton and Thorton

Fifth-generation Athlon Barton-core processors released in early 2003 featured PR ratings of 2500+, 2600+, 2800+, 3000+, and 3200+. While not operating at higher clock rates than Thoroughbred-core processors, they earned their higher PR-rating by featuring a total of 512 KiB L2 cache and, in some models, a faster 400 MT/s front side bus.[18] The Thorton core was a variant of the Barton with half of the L2 cache disabled and thus functionally identical to the Thoroughbred core.

By the time of Barton's release, the "Northwood" Pentium 4 had become more than competitive with AMD's processors.[19] Unfortunately, due to the architecture of AMD's processor caches, an L2 cache increase to 512 KiB did not have nearly the same impact as it did to Intel's line. Only an increase of several percent was gained in per-clock performance.[18] The PR rating became somewhat inaccurate because some Barton models with lower clock rate weren't consistently outperforming their higher-clocked Thoroughbred predecessors with lower ratings.[19]

The other improvement, a higher 400 MT/s bus clock, helped Barton gain some more efficiency. However, it was clear by this time that Intel's quad-pumped bus was scaling well above AMD's double-pumped EV6 bus. The 800 MT/s Pentium 4 bus was well out of Athlon's reach. In order to reach the same bandwidth levels, the Athlon bus would have to be clocked at levels simply unreachable.[18]

The K7 architecture had scaled to its limit. Maintaining performance equivalence with Intel's improving processors would require a significant redesign.[18] AMD would soon launch Athlon 64.

Specifications:
Barton (130 nm)

  • L1-Cache: 64 + 64 KiB (Data + Instructions)
  • L2-Cache: 512 KiB, fullspeed
  • MMX, 3DNow!, SSE
  • Socket A (EV6)
  • Front side bus: 166/200 MHz (333/400 MT/s)
  • VCore: 1.65 V
  • First release: February 10, 2003
  • Clockrate: 1833-2333 MHz (2500+ to 3200+)
    • 166 MHz FSB: 1833-2333 MHz (2500+ to 3200+)
    • 200 MHz FSB: 2100, 2200 MHz (3000+, 3200+)

Thorton (130 nm)

  • L1-Cache: 64 + 64 KiB (Data + Instructions)
  • L2-Cache: 256 KiB, fullspeed
  • MMX, 3DNow!, SSE
  • Socket A (EV6)
  • Front side bus: 133/166/200 MHz (266/333/400 MT/s)
  • VCore: 1.5 V - 1.65 V
  • First release: September 2003
  • Clockrate: 1667-2200 MHz (2000+ to 3100+)
    • 133 MHz FSB: 1600 - 2133 MHz (2000+ to 2600+)
    • 166 MHz FSB: 2083 MHz (2600+)
    • 200 MHz FSB: 2200 MHz (3100+)

Thoroughbred (T-Bred)


Athlon XP "Thoroughbred A" 1700+

The fourth-generation Athlon, the Thoroughbred, was released 10 June 2002 at 1.8 GHz, or 2200+ on the PR rating system. The "Thoroughbred" core marked AMD's first production 130 nm silicon, resulting in a significant reduction in die size compared to its 180 nm predecessor.

There are two versions of this core, commonly called A and B. The A version was introduced at 1800 MHz, and had some heat and design issues that held its clock scalability back. In fact, AMD wasn't able to increase its clock above Palomino's top grades. Because of this, it was only sold in versions from 1333 to 1800 MHz, replacing the larger Palomino core. The B version of Thoroughbred has an additional metal layer to improve its ability to reach higher clock speeds. It launched at higher clock speeds.

Other than the new manufacturing process, the Thoroughbred design was largely the same as the "Palomino". The Thoroughbred line received an increased front side bus clock during its lifetime, up to 333 MT/s from 266 MT/s. This improved the processor's memory and I/O access efficiency, and improved per-clock performance as a result. AMD shifted their PR rating scheme accordingly, making lower clock speeds equate to higher PR ratings.

Specifications

  • L1-Cache: 64 + 64 KiB (Data + Instructions)
  • L2-Cache: 256 KiB, fullspeed
  • MMX, 3DNow!, SSE
  • Socket A (EV6)
  • Front side bus: 133/166 MHz (266/333 MT/s)
  • VCore: 1.5 V - 1.65 V
  • First release: June 10, 2002 (A), August 21, 2002 (B)
  • Clockrate:
    • T-Bred "A": 1400-1800 MHz (1600+ to 2200+)
    • T-Bred "B": 1400-2250 MHz (1600+ to 2800+)
    • 133 MHz FSB: 1400-2133 MHz (1600+ to 2600+)
    • 166 MHz FSB: 2083-2250 MHz (2600+ to 2800+)

Palomino

AMD released the third major Athlon version on October 9, 2001, code-named "Palomino", and named it Athlon XP. The Athlon XP was marketed using a PR system, which compared its performance to an Athlon with the "Thunderbird" core. Athlon XP was introduced at speeds between 1333 and 1533 MHz, with ratings from 1500+ to 1800+. At launch, the new core allowed AMD to take the x86 performance lead with the 1800+ model, and enhance that lead with the release of the 1600 MHz 1900+ less than a month later.[13] The "XP" suffix is interpreted to mean eXtreme Performance and also as an unofficial reference to Windows XP.[14]

Palomino was the first K7 core to include the full SSE instruction set from the Intel Pentium III as well as AMD's 3DNow! Professional. It is roughly 10% faster than Thunderbird at the same clock speed, thanks in part to the new SIMD functionality and to several additional improvements. The core has enhancements to the K7's TLB architecture and the addition of a hardware data prefetch mechanism to better take advantage of available memory bandwidth.[15]

Changes in core layout result in Palomino being more frugal with its electrical demands, consuming approximately 20% less power than its predecessor, and thus reducing heat output comparatively as well.[16] While Athlon "Thunderbird" was near its clock ceiling at 1400 MHz, changes to Palomino's transistor layout and the reduction in power demands allowed it to continue increasing clock speed even at the same 180 nm manufacturing process node and core voltage.

The "Palomino" was actually first released as a mobile version, called the Mobile Athlon 4 (codenamed "Corvette").[15] Palomino was also available in a form that officially supports dual processing, known as Athlon MP.[17]

Specifications

Thunderbird (T-Bird)

The second generation Athlon, the Thunderbird, debuted on June 5, 2000. This version of the Athlon shipped in a more traditional pin-grid array (PGA) format that plugged into a socket ("Socket A") on the motherboard (it also shipped in the slot A package). It was sold at speeds ranging from 600 to 1400 MHz. The major difference, however, was cache design. Just as Intel had done when they replaced the old Katmai Pentium III with the much faster Coppermine P-III, AMD replaced the 512 KiB external reduced-speed cache of the Athlon Classic with 256 KiB of on-chip, full-speed exclusive cache. As a general rule, more cache improves performance, but faster cache improves it further still.[11]

AMD changed cache design significantly with Thunderbird. With the older Athlon CPUs, the CPU caching was of an inclusive design where data from the L1 is duplicated in the L2 cache. Thunderbird moved to an exclusive design where the L1 cache's contents are not duplicated in the L2. This increases total cache size of the processor and effectively makes caching behave as if there is a very large L1 cache with a slower region (the L2) and a very fast region (the L1).[12] Because of Athlon's very large L1 cache and the exclusive design which turns the L2 cache into basically a "victim cache", the need for high L2 performance and size was lessened. AMD kept the 64-bit L2 cache data bus from the older Athlons, as a result, and allowed it to have a relatively high latency. A simpler L2 cache reduced the possibility of the L2 cache causing clock scaling and yield issues. Still, instead of the 2-way associative scheme used in older Athlons, Thunderbird did move to a more efficient 16-way associative layout.[11]

The Thunderbird was AMD's most successful product since the Am386DX-40 ten years earlier. Mainboard designs had improved considerably by this time, and the initial trickle of Athlon mainboard makers had swollen to include every major manufacturer. Their new fab in Dresden came on-line, allowing further production increases, and the process technology was improved by a switch to copper interconnects. In October 2000 the Athlon "C" was introduced, raising the mainboard front side bus speed to 133 MHz (266 MT/s) and providing roughly 10% extra performance per clock over the "B" model Thunderbird.

Specifications