SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Scumbria who wrote (107320)4/22/2000 1:00:00 AM
From: Steve Porter  Read Replies (1) | Respond to of 1572932
 
Scumbria,

Athlon is 64Byte cacheline length.

I'm unsure if PIII's have extended the cacheline length, but I BELIEVE it is 32Bytes.

Steve



To: Scumbria who wrote (107320)4/22/2000 1:20:00 AM
From: Ali Chen  Read Replies (2) | Respond to of 1572932
 
<There isn't any formula, and it is not consistent from application to application. I base my numbers on trace simulations.>

The hit rate depends on code and data locality, and
therefore there are huge variations totally dependent
on the application. Again, hit rates are different for
code and for data. What year are your traces? SPECint92?

Athlon and Alpha have 64bytes cacheline, Pentiums
have 32Bytes.

<The critical word is always forwarded directly to the execution unit,>
Surprisingly, Rambus does not do this :)

Again, people always forgot about FSB overhead,
TLB handling, and snooping/probing.
The whole article is not professional.

- Ali



To: Scumbria who wrote (107320)4/22/2000 1:41:00 AM
From: Charles R  Respond to of 1572932
 
<I believe that the L1 caches of both Athlon and PIII have 32 byte line sizes. If not 32, then 64 bytes.>

I think PIII is 32 and Athlon is 64 but I could be wrong.



To: Scumbria who wrote (107320)4/22/2000 3:29:00 AM
From: pgerassi  Read Replies (1) | Respond to of 1572932
 
Dear Scumbria:

I believe it was Dirk Meyer that stated that Athlons 64K L1 Instruction and 64K L1 Data Caches have a 64 byte line whereas Coppermines still have a 32 byte line.

Pete



To: Scumbria who wrote (107320)4/22/2000 12:23:00 PM
From: milo_morai  Read Replies (1) | Respond to of 1572932
 
AMD Athlon? Processor Architecture
The World's First Seventh-Generation x86 Processor: Delivering the Ultimate Performance for Cutting-Edge Software Applications

The AMD Athlon? processor is the first member of a new family of seventh-generation AMD processors designed to meet the computation-intensive requirements of cutting-edge software applications running on high-performance desktop systems, workstations, and servers.
The AMD Athlon processor is the world's most powerful x86 processor, outperforming Intel's Pentium© III processor and delivering the highest integer, floating point and 3D multimedia performance for applications running on x86 system platforms . The AMD Athlon provides industry-leading processing power for cutting-edge software applications, including digital content creation, digital photo editing, digital video, image compression, video encoding for streaming over the Internet, soft DVD, commercial 3D modeling, workstation-class computer-aided design (CAD), commercial desktop publishing, and speech recognition. It also offers the scalability and "peace-of-mind" reliability that IT managers and business users require for networked enterprise computing.

The AMD Athlon processor features the industry's first seventh-generation x86 microarchitecture, which is designed to support the growing processor and system bandwidth requirements of emerging software, graphics, I/O, and memory technologies. The AMD Athlon processor's high-speed execution core includes multiple x86 instruction decoders, a dual-ported 128KB split-L1 cache, three independent integer pipelines, three address calculation pipelines, and the x86 industry's first superscalar, fully pipelined, out-of-order, three-way floating point engine. The floating point engine is capable of delivering 4.0 Gflops of single-precision and more than 2.0 Gflops of double-precision floating point results at 1000 MHz for superior performance on numerically complex applications.

Download Entire PDF Document Here!

amd.com

amd.com

Taken from PDF file

WW W W HH H H II I I TT T T EE E E PP P P AA A A PP P P EE E E RR R R
Page 4 Architecture ? 52594A March 9, 2000
Pentium III and Pentium III Xeon processors. (See Table 1, Competitive Comparison, on
previous page.) The AMD Athlon processor features a superpipelined, nine-issue
superscalar microarchitecture optimized for high clock frequency. The AMD Athlon
processor has a large dual-ported 128KB split-L1 cache (64KB instruction cache + 64KB
data cache); a two-way, 2048-entry branch prediction table; multiple parallel x86
instruction decoders; and multiple integer and floating point schedulers for independent
superscalar, out-of-order, speculative execution of instructions. These elements are packed
into an aggressive processing pipeline that includes 10-stage integer and 15-stage floating
point pipelines.
The innovative AMD Athlon processor architecture implements the x86 instruction set
by internally decoding x86 instructions into fixed-length ?Macro-Ops? for higher
instruction throughput and increased processing power. The AMD Athlon processor
contains nine execution pipelines?three for address calculations, three for integer
calculations, and three for execution of MMX ? , 3DNow!, and x87 floating point
instructions.
Figure 1: AMD Athlon ? Processor Architecture Block Diagram
The AMD Athlon processor is binary-compatible with existing x86 software and
backwards compatible with applications optimized for MMX and 3DNow! instructions.
Using a data format and single-instruction multiple-data (SIMD) operations based on the
Load / Store Queue Unit
IEU AGU
Instruction Control Unit (72-entry)
Fetch/Decode
Control
2-way, 64KB Data Cache
32-entry L1 TLB/256-entry L2 TLB
3-Way x86 Instruction Decoders
FPU Register File(88-entry)
FADD
MMX
3DNow!
FStore
FMUL
MMX
3DNow!
IEU
Integer Scheduler (18-entry) FPU StackMap / Rename
L2 SRAMs System Interface
2-way, 64KB Instruction Cache
24-entry L1 TLB/256-entry L2 TLB
Predecode
Cache
Branch
Prediction Table
L2 Cache
Controller
Bus
Interface
Unit
FPU Scheduler (36-entry)
AGU IEU AGU



To: Scumbria who wrote (107320)4/22/2000 12:48:00 PM
From: milo_morai  Respond to of 1572932
 
More L1 info from PDF

WW W W HH H H II I I TT T T EE E E PP P P AA A A PP P P EE E E RR R R
Page 6 Architecture ? 52594A March 9, 2000
High-Performance Cache Design
The AMD Athlon processor?s high-performance cache architecture includes an
integrated, 64-bit, dual-ported 128KB split-L1 cache with separate snoop port; multi-level
translation look-aside buffers (TLBs); a scalable L2 cache controller with a 72-bit (64-bit
data + 8-bit ECC) interface to as much as 8MB of industry-standard SDR or DDR SRAMs;
and an integrated tag for cost-effective 512KB L2 configurations.
The AMD Athlon processor?s integrated L1 cache comprises two separate 64KB, two-way
set-associative data and instruction caches and is four times larger than the Pentium III
processor?s L1 cache (128KB vs. 32KB). The data cache has eight banks to support
concurrent access by two 64-bit loads or stores. The instruction cache contains predecode
data to assist multiple, high-performance instruction decoders. The robust bi-level TLB
structure minimizes code and data delays when accessing physical memory.
The AMD Athlon processor?s L2 cache controller operates at the maximum frequency
compatible with the latest industry-standard SRAMs, including DDR. The integrated L2
cache tag provides a full tag for a 512KB L2 cache or a partial tag for larger L2 caches.
The AMD Athlon processor?s cache architecture thus provides higher scalability than the
Pentium III or Pentium III Xeon processor?s cache architecture.
The cache architecture of the AMD Athlon processor enables high instruction
execution rates by minimizing effective memory latency and system snoop responses, and
it provides large spatial locality of data for transaction-based applications and
multiprocessing operating systems. The architecture also supports high-bandwidth data
transfers to and from the execution resources, and it contributes to significant performance
gains and extremely fast operation of data-intensive software programs.
The AMD Athlon processor?s cache architecture is the first to incorporate a system-based
MOESI (Modify, Owner, Exclusive, Shared, Invalid) cache control protocol for x86
multiprocessing platforms. Since the system logic manages memory coherency throughout
the system by specifying all cache state transitions, either using a MESI or MOESI cache
coherency protocol, and by filtering out unnecessary processor snoops, AMD Athlon
processors are designed to deliver exceptional performance in both uniprocessor and
multiprocessor system configurations. The AMD Athlon processor cache architecture also
supports error correction code (ECC) protection, which is a required feature for high
reliability of business desktop systems, workstations, and servers.