SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: eracer who wrote (215013)10/26/2006 3:58:51 AM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear Eracer:

You continue to use bad estimates for the size of a GPU using AMD64 grade processes. Since the AMD 65nm ssSOI process used to make DC and soon QC K8Ls generates DC CPUs at more than 2GHz using less than 35W including twin 72bit DDR2 controllers, you don't need as many pipelines as you would for GPUs on bulk 90nm processes that run only 500MHz or so. 1/4 the number of 2GHz pipelines will have the same performance. A mid range GPU of today with 12 pipelines at 500MHz could be done with just 3 pipelines at 2GHz. They would use about 1/4th of the die area at 90nm and 1/8th at 65nm ssSOI at likely less power as well.

A Geforce 7900GT with 24 pipelines at 450MHz could be done with only 4 pipelines at 2.7GHz. That's only 12.5mm2 and at the midrange of 65nm ssSOI CPUs. 2 of them, one for each DC K8L core, would equal the best high end DC 3d gaming system of today. 4x4 systems of course would have 4 GPUs each with more than the capability of a Geforce 7900GT tied to 4 3+GHz K8L cores with 9MB of total cache and 288 bits of high speed large capacity DDR2 memory. That would be faster than any Intel box with quad GPUs at the time.

Much of the graphics memory is currently used to duplicate textures, various lists and frame buffers. Quad discrete GPUs need four copies of this information. The on die GPUs only need to keep one copy reducing bandwidth requirements and they can see what the others have done without needing a synchronization layer to keep everything straight. Much of the effort is thus duplicated by each of the discrete GPUs and load the CPUs with keeping track of it all. Most of that goes away with DCA and MOESI.

And before you harp on memory BW needs of the GPUs, without the capabilities of the CPU L1 and L2 caches, most CPUs would have 1/10th of the performance. A GPU that uses a CPU type multi level cache could see BW requirements drop from 1/3 to 1/8th what discrete GPUs need in aggregate.

By the time 2008 rolls around, 45nm will be around and all these shift in performance in the same direction. OC (8 core) would have 4 CPU cores, 4 GPU cores and 9MB of total cache all running well above 3GHz. With 2 channels of DDR3 (or DDR4) with an option of a HT connected 2nd pair of DDR3 (or DDR4) channels (socket F MBs). If ZRAM is available, look for total cache to grow to 25MB halving DRAM memory bandwidth needs again. 4x4 enthusiast systems would become 8x8 forcing Intel to a DCA like model to keep up.

To make it easier for you to visualize the die, simply replace a K8L core sans L1 or L2 on any die shot with a 4 pipeline GPU. In fact working with ATI division, they could well replace a K8L core with 8 GPU pipelines. QC dies will be mainstream at 45nm, so any combination of K8L and 4 pipe GPUs core with at least one being a K8L core. Traditional servers may only need one or two GPU cores for the entire server so many dies would consist of only K8L cores. 1 or 2 would have one GPU core in them (the second is a backup in case the primary GPU fails). High end gamers probably would opt for the 1 K8L core and 3 GPU cores when used in a dual socket 4x4 MB. Most of us would be happy with a single 2x2 Fusion CPU.

Pete



To: eracer who wrote (215013)10/27/2006 5:40:17 AM
From: Joe NYCRead Replies (2) | Respond to of 275872
 
eracer,

I do too. That is why I believe the first generation of integrated GPU may be of little value for AMD investors.

I don't know what the breakdown is between integrated (in the northbridge) and discrete graphics today. Let's say it is 70% integrated and 30% discrete. Having better solution to 70% of the market (including very important mobile market) must be worth something.

One thing worth mentioning is that currently, with ODMC, AMD has advantage in some markets (High End server, desktop) but has no advantage - or may have a slight disadvantage in low end market with integrated chipsets. The reason is that Intel integrated solution has memory controller and graphics processor on the same die, with memory accesses not having to go over northbridge - CPU connection.

In current AMD (northbridge) integrated solution, the video requests need to go one hop to the CPU, or have some local memory. So while this issue will be solved for AMD with integrated CPU-GPU, it will become an issue for Intel, which is moving to integrated ODMC in similar timeframe, but unlikely integrated CPU-GPU at the same timeframe.

I think the integration of ODMC and GPU to the CPU die is a natural progression, but it was a 2 step process due to transistor budgets. Full efficiencies of are achieved only when both steps are completed.

AMD will need to transition to XDR or similar high-bandwidth solution if they want a fast integrated GPU to use system memory. A high-end GPU of today like the 7900GT would probably use 30W or less (minus the memory power requirements) and use 60 mm^2 die area if manufactured on 45-nm SOI. The ATI equivalent would be somewhat larger and warmer, but still doable. Unfortunately high-end today will probably be low-end mainstream by late 2008 or early 2009.

XBox has shown the solution to high end graphics at 2005 level in 2005. See the system diagram here:
en.wikipedia.org

It involved total bandwidth to memory of 22.4 GB/s with additional massive bandwidth provided by embedded memory.

Looking at K8L die, 2 MB of SRAM L3 seems roughly 30 mm^2. 10MB at 45nm would be roughly 75 mm^2 using SRAM. That would be a bit on the high side even for a high end system, but a cakewalk with Z-RAM. If Z-RAM is 4 to 5 x as dense as SRAM, we are talking 15 to 20mm^2 per 10MB, which could make 20 to 40 MB feasible for high end solution.

BTW, don't forget the gaming market. I am pretty sure that combined ATI-AMD are going to be actively working to retain and expand the share of this market. It seems to me that the that after tackling the mainstream market in 2008-2009 with integrated CPU-GPU, the goal will be to tackle the high end market with a solution that at the same time could be the basis of the next-gen game consoles.

Joe