SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 214.18-0.5%Dec 31 3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: eracer who wrote (215013)10/26/2006 3:58:51 AM
From: pgerassiRead Replies (1) of 275872
 
Dear Eracer:

You continue to use bad estimates for the size of a GPU using AMD64 grade processes. Since the AMD 65nm ssSOI process used to make DC and soon QC K8Ls generates DC CPUs at more than 2GHz using less than 35W including twin 72bit DDR2 controllers, you don't need as many pipelines as you would for GPUs on bulk 90nm processes that run only 500MHz or so. 1/4 the number of 2GHz pipelines will have the same performance. A mid range GPU of today with 12 pipelines at 500MHz could be done with just 3 pipelines at 2GHz. They would use about 1/4th of the die area at 90nm and 1/8th at 65nm ssSOI at likely less power as well.

A Geforce 7900GT with 24 pipelines at 450MHz could be done with only 4 pipelines at 2.7GHz. That's only 12.5mm2 and at the midrange of 65nm ssSOI CPUs. 2 of them, one for each DC K8L core, would equal the best high end DC 3d gaming system of today. 4x4 systems of course would have 4 GPUs each with more than the capability of a Geforce 7900GT tied to 4 3+GHz K8L cores with 9MB of total cache and 288 bits of high speed large capacity DDR2 memory. That would be faster than any Intel box with quad GPUs at the time.

Much of the graphics memory is currently used to duplicate textures, various lists and frame buffers. Quad discrete GPUs need four copies of this information. The on die GPUs only need to keep one copy reducing bandwidth requirements and they can see what the others have done without needing a synchronization layer to keep everything straight. Much of the effort is thus duplicated by each of the discrete GPUs and load the CPUs with keeping track of it all. Most of that goes away with DCA and MOESI.

And before you harp on memory BW needs of the GPUs, without the capabilities of the CPU L1 and L2 caches, most CPUs would have 1/10th of the performance. A GPU that uses a CPU type multi level cache could see BW requirements drop from 1/3 to 1/8th what discrete GPUs need in aggregate.

By the time 2008 rolls around, 45nm will be around and all these shift in performance in the same direction. OC (8 core) would have 4 CPU cores, 4 GPU cores and 9MB of total cache all running well above 3GHz. With 2 channels of DDR3 (or DDR4) with an option of a HT connected 2nd pair of DDR3 (or DDR4) channels (socket F MBs). If ZRAM is available, look for total cache to grow to 25MB halving DRAM memory bandwidth needs again. 4x4 enthusiast systems would become 8x8 forcing Intel to a DCA like model to keep up.

To make it easier for you to visualize the die, simply replace a K8L core sans L1 or L2 on any die shot with a 4 pipeline GPU. In fact working with ATI division, they could well replace a K8L core with 8 GPU pipelines. QC dies will be mainstream at 45nm, so any combination of K8L and 4 pipe GPUs core with at least one being a K8L core. Traditional servers may only need one or two GPU cores for the entire server so many dies would consist of only K8L cores. 1 or 2 would have one GPU core in them (the second is a backup in case the primary GPU fails). High end gamers probably would opt for the 1 K8L core and 3 GPU cores when used in a dual socket 4x4 MB. Most of us would be happy with a single 2x2 Fusion CPU.

Pete
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext