SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 231.83+1.7%Jan 16 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: eracer who wrote (255269)8/5/2008 11:21:27 PM
From: pgerassiRead Replies (1) of 275872
 
Eracer:

Did you notice that the 1600x1200 is not using AA but just 4 samples (4xAF)? Comparing it to a test where 1600x1200 using 16AF and 4xAA is like turning down the settings to bare minimum. Thus its like turning to low quality 60FPs 1600x1200 in a game like FEAR where most modern entry level GPU cards are CPU limited. Besides the current top IGP is the 790GX which runs like a Radeon 3470 with the 128MB sideport memory and dual channel PC2-8500. The 40:4:4 790GX at 500MHz/533MHz SP enabled with Phenom 9550 DDR2-1066MHz unganged gets 30FPS at 1600x1200 in FEAR high quality. Pushing it to 1000MHz/533MHz SP enabled boosts that by 10% to 33FPS. Disabling the SP at 500/533MHz reduces the FPS to 26. Thus a 20% memory BW reduction yields a 13.3% FPS reduction. That shows it to be mostly memory BW limited.

Imagine a 2008 IGP getting a real 50% of a 13 core Larrabee estimate for 2010. Given the memory BW considerations, look for a 2010 8 core Larrabee to be the equal of a 2008 790GX IGP. Of course given that IGPs have doubled or tripled in performance every year, the 2010 40nm SOI IGP (R870 based) would be about 4 to 9 times the performance of the current 790GX or between 26 and 60 GHz Larrabee cores.

And that assumes the 80:4:4 Stream:TU:ROP RV710 configuration holds for the RV810. Rumor has the TSMC 40nm SOI R870 being 12 cores (16 SP subcores plus 4 TUs per core) plus 24 ROPs at 1GHz. That yields a configuration of 960:48:24 for the 58xx Radeons. That is about 160 GHz Larrabee cores. I think it will be more like 16 cores or 1280:64:32 given that perfect scaling is 1.89x from 55nm to 40nm while I figure only about 1.6x real scaling. I also think that a 33% rise to 1GHz is conservative given the frequency scaling from a 55nm bulk process to 40nm dual strain SOI process should be higher at the same power.

And Larrabee supporters seem to forget the 8 SP MAC wide vector unit in each will vastly increase power consumption compared to the ALU, DP FPADD and DP FPMUL units in the Pentium core. As will some of the other features boost that further like the 512 bit bidirectional ring bus and the 512 bit DDR3 memory interface. So look for Larrabee clocks to be quite a bit slower than some expect.

As for cross fire scaling, do recall that unlike Larrabee, the ODMC is duplicated as well giving the CF 48xx twice the BW than a single 48xx would. A 64 core Larrabee would have the same bandwidth of a 32 core one. Given that a 1GHz 32 core Larrabee gets 1.0 TFlops of GPGPU performance which is the same as a 625MHz R770 (4850). What isn't compared is the ROPs and scheduling units that the R770 has that isn't in Larrabee. The latter had to use the less efficient scalar CPU cores for that. Also the 4850 gets 200 GFlops of DP power compared to 64 GFlops for 32 core GHz Larrabee. The 4870 gets 1.2 TFlops / 240 GFlops respectively.

The stuff about the R600 is the same. If Larrabee has so little memory usage, then the infrastructure of a 32 core Larrabee is at least as overbuilt as the R600. And the performance was bad and so will Larrabee given that scaling chart. Thus since the R600 was hot, slow and failed in comparison to its competition, so Larrabee will be against its competition especially that it tries to do more in software. Another telling example of this do all in software performance hit is the original Macintosh Lisa. It did everything in software including reading/writing to the floppy and communicating to serial devices. It took a reasonably fast 8MHz 68K and made it run far slower than a lowly 4.77MHz 8088 in the IBM PC. All to save about $10 to $20 in parts.

Pete
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext