Alex: did the link move. Is this same article
toshiba.com
Choosing High-Performance DRAM for Tomorrow's Applications
Introduction The purpose of this paper is to give engineers and systems designers an overview of the options they have available to them for high-performance DRAM, examine the cost/performance tradeoffs of various DRAM solutions, and discuss the suitability of each type for specific applications. Additionally, a new DRAM architecture, called the Fast Cycle RAM (FCRAM) will be introduced.
The system designers of today know very well that they require high-performance, high-density memory solutions to satisfy the increased processor frequencies and growing complexity of end-user applications. What is not a simple choice is which DRAM type to use. There are Synchronous DRAMs (SDRAMs) with 100MHz, 133MHz and faster clock rates, Double Data Rate (DDR) SDRAMs and Rambus DRAMs (RDRAMs). System and memory controller designers are faced with pressure to adopt mainstream solutions and try to avoid low-volume, niche products. The DRAM supplier has to determine which one(s) to prioritize.
An even larger concern for the DRAM supplier is having the appropriate process technology necessary to offer these solutions, and being able to transition to these fine geometries with minimal investment and technical barriers.
So which of these device types should a system designer use and DRAM manufacturer produce? Looking at chipset and memory controller roadmaps across a wide range of applications, it is clear that 100MHz, 133MHz and faster SDRAMs, DDR and RDRAM will all co-exist. To solve this mystery, we really need to determine in what timeframe and in which applications these solutions will exist, which in turn determines their relative demand and production volume. Let's first look at the cost/performance tradeoffs of each of these DRAM solutions, and based on this, determine the suitability of each type for various applications.
Performance Comparisons PC100 vs. PC133 Faster speed versions of today's 100MHz (PC100) SDRAMs are a logical and evolutionary progression. Chipsets and memory controllers already exist which support 133MHz (PC133) and faster memory busses. The key factor in determining their success is the cost/performance tradeoff. A PC133 SDRAM may or may not outperform a PC100 SDRAM depending on three critical parameters commonly referred to as CAS latency (CL), RAS-to-CAS delay time (tRCD) and RAS pre-charge time (tRP). These parameters are measured in terms of the number of clock cycles. For example, a device with CL = 2 cycles, tRCD = 2 cycles and tRP = 2 cycles, is commonly referred to as a 2-2-2 device.
TABLE 1:The table below shows a comparison of a PC100 CL2 device to PC133 CL3 and CL2 devices.
Memory Bus Speed CAS Latency (CL) RAS Pre-charge Time (tRP) RAS-to-CAS Delay Time (tRCD) CL+tRP+tRCD (total time) Performance (normalized) 100MHz (PC100) 20ns (2 cycles) 20ns (2 cycles) 20ns (2 cycles) 60ns 1.00 133MHz (PC133) 22.5ns (3 cycles) 20ns (2.67 cycles) 20ns (2.67 cycles) 62.5ns 0.96 133MHz (PC133) 15ns (2 cycles) 15ns (2 cycles) 15ns (2 cycles) 45ns 1.25
The above values were taken from Toshiba's 128M SDRAM datasheet.
Compared with a PC100 CL2 device, which is considered today's baseline for memory performance, the PC133 CL3 device is about 4% slower, while the PC133 CL2 device is 17% faster. Of course, these calculations are based solely on the above three critical parameters, and actual system performance will depend on the application and other factors, many of which will be discussed below.
It is also worthwhile to note that two of the three parameters, tRP and tRCD, are actually shown as fixed values in nanoseconds and are not necessarily an integer number of clock cycles. If the memory controller only comprehends these parameters as an integer number of clock cycles, then they must be rounded up to the next highest value. For example, in the above table, tRP is 20ns for all three types. In the case of PC100, 20ns is exactly two clock cycles, however for the PC133 device 20ns is 2.67 clock cycles, which has to be rounded up to three cycles. Therefore, in the above example, the PC100 CL2 device is referred to as 2-2-2, the PC133 CL3 device as 3-3-3, and the PC133 CL2 device as 2-2-2.
DDR vs. RDRAM The performance benefits of DDR vs. RDRAM are commonly debated in the industry today and wide ranges of performance numbers are shown, especially in peak bandwidth comparisons. While peak bandwidth is important, it is really only the "top line" of actual system performance. The bottom line is sustained, or effective, bandwidth, which is also a function of other memory parameters and features, such as latency, the number of internal banks and the read-to-write/write-to-read bus turnaround time. Effective bandwidth is also a function of certain system or application-dependent parameters, such as burst length.
TABLE 2: In the table below, we compare the peak bandwidth for PC100, DDR and RDRAM for various memory bus widths.
DRAM Type Clock/Data Rate Memory Bus Width Peak Bandwidth PC100 100MHz/100MHz 64-bit 800MB/sec DDR 100MHz/200MHz 64-bit 1.6GB/sec DDR-II 200MHz/400MHz 64-bit 3.2GB/sec 128-bit 6.4GB/sec RDRAM 400MHz/800MHz 16-bit (1 channel) 1.6GB/sec 32-bit (2 channels) 3.2GB/sec 64-bit (4 channels) 6.4GB/sec
It should be noted that DDR in the above table is based on today's industry specification, which basically includes 100MHz and 133MHz clock rates. DDR-II is currently being defined by JEDEC, and is expected to offer much higher clock rates and features to improve effective bandwidth, some of which will be discussed below.
Based on the above analysis, DDR can match RDRAM in terms of peak bandwidth. However, the system designer must make the determination of which device to use based upon the advantages/disadvantages of widening the bus from 64 to 128 bits for DDR vs. adding multiple channels for RDRAM. Additionally, peak bandwidth is only one factor in determining effective bandwidth as was mentioned above and will be discussed further below.
FCRAM - A Faster Memory Core All of the DRAM types commonly discussed in the industry today, such as EDO, SDRAM, DDR and RDRAM, have one major thing in common ¤ their memory cores are the same. What is different about each type is the peripheral logic circuitry, not the memory cell array. What this complex new peripheral logic circuitry does is attempt to hide the inherently slow memory core.
FCRAM is a novel concept, which finally recognizes and fixes the slow memory core by segmenting it into smaller arrays such that the data can be accessed much faster and latency greatly improved. How this is done is beyond the scope of this paper. If the reader is interested, both Toshiba and Fujitsu can provide more detailed information on FCRAM functionality.
The key measure of how FCRAM improves latency and can improve system performance is the read/write cycle time (tRC), which measures how long the DRAM takes to complete a read or write cycle before it can start another one. In the case of conventional DRAM types, including SDRAM, DDR and RDRAM, tRC is typically on the order of 70ns. With FCRAM, tRC of 20 or 30ns is possible. For this reason, this new device is referred to as a fast cycle RAM.
Besides faster tRC, FCRAM also improves latency with several new features that will be discussed below.
DRAM Features/Parameters Which Affect Actual System Performance Latency Very simply, latency is how long it takes a DRAM to begin outputting data in response to a command from the memory controller. There are many different measures of DRAM latency. For example, the time it takes the DRAM to access the data from when the row address is provided by the memory controller is called the row address access time (tRAC), which is typically on the order of 50 to 60ns. The RAS pre-charge time (tRP), typically 20ns, is another measure of DRAM latency. Most of the measures of DRAM latency are a function of the memory core design and wafer process technology used. Therefore, it is a reasonable assumption that DRAM internal latency is the same for SDRAM, DDR and RDRAM for a given design and process. The features/parameters mentioned below have a wider variance between device types and therefore a wider-ranging impact on system latency and performance. The memory concept called FCRAM mentioned above demonstrates how a new memory core architecture can truly improve inherent DRAM latency.
Number of Banks The number of internal banks a DRAM has is perhaps the biggest factor in determining actual system latency. This is because of the fact that a DRAM can access data much faster if it is located in a bank that has been activated. By activated, we really mean that the data is located in a bank that has been pre-charged. The pre-charged bank can either be the same page (row) that is currently being accessed, or it can be in a bank that is not currently being accessed. If the data is located in a pre-charged bank, we often call this a page hit, meaning the data can be accessed very quickly without a delay penalty of having to close the current page and pre-charge another bank. On the other hand, if the data is in a bank that has not been pre-charged, or in a different row within the bank currently being accessed, a page miss occurs and performance is degraded due to the additional latency of having to pre-charge a bank.
The memory controller designer can minimize latency by keeping all unused banks pre-charged. Therefore, more internal DRAM banks increases the probability that the next data accessed will be to an active bank and minimizes latency.
TABLE 3: The table below shows how the number of banks affects the hit and miss rates, assuming all unused banks are always pre-charged.
DRAM Type # of Banks Miss Rate Hit Rate SDRAM (16M bit) 2 50% (1/2) 50% (1/2) SDRAM/DDR
(64M bit and higher) 4 25% (1/4) 75% (3/4) RDRAM 16 6% (1/16) 94% (15/16)
Clearly, adding more banks increases the hit rate and reduces latency, however adding banks increases the die size and cost of the DRAM. Therefore, a cost/performance comparison is necessary when determining how critical it is to reduce latency by increasing the number of banks.
Bus Turnaround Time Because of the increasing functionality required of today's main memory subsystems, the time that it takes a DRAM to switch between a read and write cycle or between a write and read cycle is becoming a critical factor. This time is commonly referred to a bus-turnaround time. Delays in turning the bus around can result in costly dead bus cycles and reduced performance. In order to minimize dead bus cycles, fast (preferably zero) read-to-write or write-to-read bus turnaround time is required.
Traditional DRAM types including EDO and SDRAM use a scheme called command decoding to determine whether the cycle is a read or a write. This means that at the same time the address is provided to the DRAM a read or write command is also provided. As the next step, the DRAM has to decode the address and command simultaneously. This results in dead bus cycles. For a relatively slow clock frequency, such as 66MHz, these dead clock cycles do not result in a prohibitive performance loss. However, as clock frequencies increase to 100/133MHz and beyond, the bus turnaround time is becoming an increasingly critical factor in determining actual system performance.
The bus turnaround time is even more critical for DDR, as data is transferred on both the rising and falling edges of the clock. In other words, for every dead clock cycle there are two dead data cycles and twice as much bandwidth "opportunity loss." The emerging DDR-II standard is attempting to address the issue of bus turnaround time with new features, such as write cycle latency being a function of read cycle latency, and the posted CAS or late write feature.
In the case of RDRAM, the bus turnaround time is less of an issue because the device has separate address and control busses, such that simultaneous decoding is not required.
The above three parameters and features, latency, the number of banks and bus-turn around time, are really a function of how the DRAM operates. The factors mentioned below, burst length and randomness, are application-dependent.
Application Dependent Parameters - Burst Length/Randomness Burst length is defined as the number of successive accesses (column addresses) within a row or pre-charged bank. In other words, burst length is the number of successive read/write cycles without having to provide a new address. DRAMs can access data very quickly if the next data is located in the same row as the current data or in a pre-charged bank. Therefore, as the burst length becomes longer, initial latency is minimized and the effective bandwidth approaches the peak bandwidth. Graphics is a good example of an application with a relatively long burst length. On the other hand, applications such as network switches and routers tend to have very short burst lengths (sometimes the burst length is one, meaning no successive accesses within a row), and initial latency becomes more critical in determining effective bandwidth. Applications with very short burst lengths are often called "random access" applications, as it is not easy for the memory controller to predict where the next data bits are located.
TABLE 4: The table below attempts to quantify short vs. long burst lengths and shows the typical applications which have burst lengths of this order.
Burst Length Typical Application 1 or 2 (short) Network switches/routers 4 to 8 (medium) PC main memory 8 to 256 (long) Graphics
By comparing today's typical DRAM timing specification, the magic number for burst length appears to be around four. Burst lengths of less than four do not take much advantage of the DRAMs peak bandwidth capability and are better served by a low-latency solution, such as the FCRAM. For burst lengths of 4 and longer, the system will be able to take advantage of the DRAM's peak bandwidth, making very high data rate devices, such as RDRAM, the ideal solution.
Summing it All Up - Bus Utilization Now that we have defined the DRAM parameters/features and system factors that determine effective bandwidth, we need some way to measure the sum of the affect of all of these items. In reality, what determines the effective bandwidth of the system is the bus utilization, which means the percentage of the time the memory bus is active in reading/writing data. Once this factor is known, it is easy to determine effective bandwidth by multiplying the bus utilization factor by the peak bandwidth. For example, if the bus utilization is 50% (meaning that 50% of the time is the maximum amount that the DRAM bus is reading/writing data), and the peak bandwidth is 1G byte per second, the effective bandwidth is 500M bytes per second max.
TABLE 5:In the table below, we have estimated the maximum effective bandwidth for PC100/133 SDRAMs, DDR, RDRAM and FCRAM.
DRAM Type PC100 PC133 DDR RDRAM FCRAM Clock speed (MHz) 100 133 133 400 133 Data rate (MHz) 100 133 266 800 266 System data bus width 64-bit 64-bit 64-bit 16-bit 64-bit Peak Bandwidth (MB/sec) 800 1067 2133 1600 2133 Bus Utilization 62% 59% 42% 74% 55% Max. Effective Bandwidth (MB/sec) 494 631 897 1190 1165
The detailed calculations used to compute the bus utilization are beyond the scope of this paper. All of the above mentioned factors which determine bus utilization and ultimately effective bandwidth were used in these calculations, and the values are based solely on data sheet parameters and timing diagrams (i.e., no marketing hype). Here are some of the methodologies and assumptions worth mentioning:
A write-read-read (W-R-R) access with burst length of 4 was chosen for comparison purposes to represent a page hit. It represents a typical main memory access to perform a cache fill, and also demonstrates the bus turnaround capability of the devices. After a page miss, a precharge cycle followed by a W-R-R is performed. The page hit/miss rates are determined solely by the number of banks. The DRAM refresh rate is 5% (meaning 5% performance loss to perform refresh) and is the same for each DRAM type. Of course, every application has different access cycles and burst lengths, however in order to perform this analysis, a fixed set of assumptions is necessary. We believe this analysis is fairly representative of the typical computer main memory conditions.
The preceding analysis leads to some interesting observations:
With SDRAM and DDR, the bus utilization decreases as the clock frequency increases. This is due to the fact that dead bus cycles (which are close to being constant for PC100/133/DDR in terms of clock cycles) have a greater impact on performance as data rates increase.
RDRAM and FCRAM can perform nearly gap-less R-W and W-R bursts. In other words, there are almost never any dead bus cycles. In the case of RDRAM, this is because of the separate address and control decoding, as previously mentioned. For FCRAM, many of the DDR-II features that improve bus efficiency have been adopted. This gives us a fairly good indication of the expected improvement in bus utilization we can expect for DDR-II, however it will still not match FCRAM due to its lower initial latency.
FCRAM can match RDRAM in terms of effective bandwidth. It is not surprising that RDRAM wins the effective bandwidth battle, due to its high peak bandwidth and architecture specifically designed for PC main memory. It is somewhat surprising that FCRAM can keep up. For applications with more randomness and shorter burst lengths, FCRAM cannot be matched. Therefore, FCRAM should be recognized as the performance winner overall
Granularity Before we can pick the winners for each application, we must discuss the concept of granularity and how it ultimately will determine system cost. Granularity is defined as the minimum system density (in megabytes) that is possible for a given DRAM configuration and system bus width.
TABLE 6: The table below shows the granularity for a peak bus width variety of DRAM types and system implementations.
DRAM Type DRAM Density DRAM Data Bus Width System Bus Width Granularity Peak Bandwidth SDRAM (100MHz clock) 64M bit 16 bit 64 bit 32MB 800MB/sec 128M bit 16 bit 64 bit 64MB 800MB/sec 256M bit 16 bit 64 bit 128MB 800MB/sec 512M bit 16 bit 64 bit 256MB 800MB/sec DDR (133MHz clock) 64M bit 16 bit 64 bit 32MB 2.13GB/sec 128M bit 16 bit 64 bit 64MB 2.13GB/sec 256M bit 16 bit 64 bit 128MB 2.13GB/sec 512M bit 16 bit 64 bit 256MB 2.13GB/sec RDRAM (400MHz clock) 128M bit 16 bit 16 bit (1 channel) 16MB 1.6GB/sec 32 bit (2 channels) 32MB 3.2GB/sec 64 bit (4 channels) 64MB 6.4GB/sec 256M bit 16 bit 16 bit (1 channel) 32MB 1.6GB/sec 32 bit (2 channels) 64MB 3.2GB/sec 64 bit (4 channels) 128MB 6.4GB/sec 512M bit 16 bit 16 bit (1 channel) 64MB 1.6GB/sec 32 bit (2 channels) 128MB 3.2GB/sec 64 bit (4 channels) 256MB 6.4GB/sec
One of the key points in the above table is the difference in system architectures for SDRAMs (including DDR) vs. RDRAMs. SDRAMs must be used in parallel, which increases granularity. In the above example, four 16-bit devices must be connected in parallel to match the 64-bit system bus width. Therefore, the system granularity is four times the granularity of the device. For RDRAM, since the system bus (Rambus channel) width is the same as the device bus width, the granularity of the system is equal to that of the RDRAM multiplied by the number channels. This has very compelling cost/performance implications.
In terms of cost, since a single RDRAM can be used, smaller memory (lower cost) systems using RDRAM are possible than for SDRAMs. For example, using 256M (x16) SDRAMs, a system with 64MB is not possible. However, with 256M RDRAMs 32MB or 64MB systems are possible. This benefit has not been realized today, as the most cost-effective (lowest cost per bit) DRAM density is the 64M, which allows low-cost systems to be built with less than 64MB of memory. However, PC main memory system manufacturers utilize the lowest cost per bit DRAM solution, and that solution will be the 256M density (regardless of SDRAM, DDR or RDRAM) in the not-too-distant future. Within the next few years, 512M and 1Gb DRAMs will be in volume production from Toshiba and possibly other suppliers, making low-density, higher performance low-cost SDRAM/DDR solutions even less feasible.
One can argue that increasing the DRAM bus width from 16 to 32 bits helps resolve this problem, however x32 DRAMs are more costly to produce and historically have not been used in main memory applications.
On the performance side of the equation, it is even more compelling for RDRAM. Not only can we build an RDRAM system with less memory than an SDRAM/DDR system, but that same RDRAM system can have significantly better performance. For example, using 256M RDRAMs, a 64MB, 2-channel RDRAM system can be built providing 3.2GB/sec of peak (2380 GB/sec effective) bandwidth. This equates to almost five times the effective bandwidth of the SDRAM system and over 2.5 times the effective bandwidth of the DDR system, yet the RDRAM system will be lower cost because 64MB is not possible with SDRAM/DDR at 256Mb density.
In table 6, found in the DDR vs. RDRAM section, we show peak bandwidth comparisons for a 128-bit system bus for DDR-II, yet they are not included in the above analysis. While increasing the bus width from 64-bit to 128-bit is possible (but not trivial), it is not considered above because it actually makes the granularity problem with SDRAM/DDR worse. However, this may not be an issue in some applications. This last point will be discussed further in the following section.
The Winners in Each Application Now that we have done detailed performance comparisons and a granularity analysis for each DRAM type, we can make some fairly reasonable conclusions on which DRAM type is appropriate for which application in what timeframe.
TABLE 7: The table below shows this.
Application Timeframe Ideal DRAM Solution Low-end Desktop PC 2000 PC100/PC133 2001 RDRAM High-end Desktop/Workstation 2000 RDRAM PC Server 2000 PC100/PC133/DDR/RDRAM 2001 DDR/RDRAM/FCRAM High-end Server/Mainframe 2000 PC100/PC133/DDR 2001 DDR/FCRAM Graphics 2000 SDRAM/DDR 2001 DDR/RDRAM Network Router/Switch 2000 FCRAM Hand-held/PDA 2000 FCRAM Digital TV/Set-top Box 2000 SDRAM 2001 DDR/RDRAM
The following explains in more detail why the above DRAM types were chosen as the ideal solution for each application.
Low-end Desktop PC This market is very cost-sensitive and will be best served by the lowest-cost DRAM solution in 2000, which will be PC100 and possibly PC133 if it is offered for no premium and yields for the 2-2-2 version improve. In 2001, the following three factors will drive RDRAM as the ideal solution in this segment.
RDRAM will become lower cost as production volumes increase and DRAM suppliers come down the learning curve. The 256M DRAM will become the most cost-effective solution, making low-cost SDRAM/DDR implementations less feasible due to the granularity issue previously discussed This market segment will also demand performance in 2001. High-end Desktop/Workstation The end-users of these systems demand performance for applications such as 3D graphics and office productivity enhancements, and they will pay for performance, making RDRAM the ideal solution. Considering microprocessor and chipset roadmaps, the year 2000 is clearly the year for RDRAM in this segment.
PC Server We define PC servers as systems with one or more CPUs, generally of the CISC/x86 variety, and which use third party chipsets, as opposed to designing their own memory controllers. This market segment has many solutions in 2000. The primary reason is that in servers main memory performance is derived more from system design techniques, such as interleaving and large L2 caches, rather than from DRAM performance. Additionally, there are many chipset options for 2000. Therefore, we expect the year 2000 to include systems using all of these solutions. In 2001, this market segment will settle somewhat and utilize primarily DDR or RDRAM. Since companies manufacturing PC servers are typically the same companies in the desktop PC business, this market segment will follow the desktop main memory trend, which is towards RDRAM. These systems also do not suffer from the granularity problem in desktops, making the evolution of SDRAMs, i.e. DDR, a feasible and likely long-term solution.
High-end Server/Mainframe The story for these large systems is basically the same as for the PC server segment, with one notable exception. Companies who manufacture these systems also design their own memory controllers (ASICs) and memory subsystems. This has a significant impact on the device type chosen. Because these companies possess a relatively large staff of skilled memory controller designers, they do not need a "cookbook" solution such as RDRAM. Additionally, they can design systems with 128-bit and wider main memory busses, hence DDR can match RDRAM in terms of performance, especially for the emerging DDR-II standard with its much-improved feature set and performance capability.
In both of the above two market segments, we also show FCRAM as a feasible solution in 2001. From a system perspective, we believe servers and other large systems will become more interested in reducing memory latency as the randomness of the data increases, which will happen as a result of more multimedia (video, audio, text, etc.) traffic occurring over the internet and within the corporation. From a DRAM perspective, FCRAM can be designed in as a superset of DDR, which means it can be easily adopted in these types of systems. Additionally, as 256M and higher density DRAMs become prevalent, the added FCRAM features become more negligible in terms of additional die cost, almost guaranteeing FCRAM's adoption into this segment.
Graphics and Digital TV/Set-top Box The mainstream graphics market tends to follow the main memory DRAM trends, with the exception of using lower-density, wider devices. For example, today the 1Mx16 (16M bit) and 2Mx32 (64M bit) are the most common devices. It should also be noted that because of the small number of DRAMs in graphics systems, the DRAM loading test specification can be reduced, resulting in faster speed versions with the same yield as the PC100/PC133 maim memory test specification. For example, Toshiba is currently offering 167MHz 2Mx32 SDRAMs with the same process as is used for our PC100/PC133 64M/128M-bit SDRAMs. We expect both x32 DDR and RDRAM to emerge as the preferred graphics solutions as these applications become more performance-driven. Digital TV and set-top boxes are an emerging market segment with much the same system criteria and likely DRAM solutions as graphics.
Network Router/Switch The networking market segment ranges from modems and interface cards up to very large routers and switches to service local and wide-area networks. It is the latter-type applications that are of interest for this discussion, as they are memory performance driven. Since the data in these types of systems is very random in nature and the data packets are small (short burst length), memory latency is the most critical parameter. Because of this, FCRAM is the ideal solution.
Hand-held/PDA This segment is differentiated from sub notebooks based on the fact that disposable (non-rechargeable) batteries are most commonly used, hence battery life is critical. Due to the fact that DRAM memory cells are composed of capacitors, which lose their charge over time and must be refreshed, DRAMs in general are not the ideal solution for these applications. However, with the system memory density increasing, DRAMs are already being adopted. As mentioned previously, one of the key design methodologies for the FCRAM that allows the latency to be greatly reduced is the segmented memory core. An additional by-product of this segmentation is a reduction in power consumption of up to 50% compared with SDRAMs of a given process and density. Therefore, FCRAM will emerge as the ideal low-power solution. In terms of system density and DRAM configuration, the hand-held/PDA closely resembles the graphics market, although much less performance driven.
DRAM Design/Process Technology Hopefully by now, selecting the ideal DRAM solution has become a simpler process for the system designer. The process of actually producing these winning solutions may not be so simple for the DRAM supplier.
In the case of the PC133 SDRAM, we have already mentioned that the 2-2-2 specification is critical for PC133 to show a measurable performance increase over PC100 and hence for PC133's success. At present, the DRAM industry is utilizing primarily 0.20um and wider processes, and yields for PC133 2-2-2 are poor. Therefore, 0.18um and finer process geometries are mandatory.
For DDR, the timing margins are very tight, making fine process geometries also critical to this product's success. Another key point is that the systems that are most likely to use DDR are large systems, which means that 128M, 256M and higher density DRAMs will be required. It is not cost effective to volume produce 256M DRAMs using 0.20um and wider processes.
RDRAM also has very tight timing margins. Additionally, the yield for the 800MHz version is best achieved at 0.18um and finer processes. Reducing the cost through higher density products, such as the 256M, is also critical for RDRAM's success.
The conclusion is fairly clear. The only way for these next generation DRAM solutions to succeed is to build them using very aggressive process geometries. It is not so clear that this is going to be easy for all DRAM suppliers to accomplish. The industry is filled with stories about suppliers having yield problems using these aggressive geometries or not being able to cost-effectively produce a higher density solution. While we can not speak for other suppliers, Toshiba believes it has overcome these concerns.
Toshiba has implemented a program called Scalability by Design, which not only assures that we can smoothly migrate to finer processes and higher densities, but also at a lower cost than standard in the industry. This program started when Toshiba, together with our design partners IBM and Siemens, introduced a new DRAM trench memory cell. Due to the inherent advantages of the trench cell, it has proven to be much easier to scale (i.e. perform die shrinks) than the alternate stacked cell implementations.
Besides this new trench cell, Toshiba also decided to introduce new wafer process equipment and technologies, starting at 0.35um, which allow us to produce five generations of product, all the way down to 0.15um, in one clean room with minimal investment. For example, at 0.35um we introduced Krypton Fluoride (KrF) steppers, shallow trench isolation (STI) and chemical mechanical polishing (CMP). Our estimate is that the introduction and production ramp of each new generation requires a 10% incremental investment on our part vs. 50% as an industry benchmark.
This cost factor is the most likely reason that some of our competitors continue to keep a portion of their production lines running older process, while we have already migrated to 0.20um at all of our production facilities. Toshiba is recognized as having the best yield for CL2 PC100/133 SDRAMs today, clearly indicating that implementing finer process geometries across all production lines is critical for success in the high speed DRAM game.
We believe that our technical advantages will become even more apparent as we migrate all of our production facilities to 0.18um starting in Q4 this year and then to 0.15um next year. Con |