SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Strategies & Market Trends : VOLTAIRE'S PORCH-MODERATED -- Ignore unavailable to you. Want to Upgrade?


To: Voltaire who wrote (19267)11/24/2000 9:13:11 PM
From: Voltaire  Respond to of 65232
 
This article speaks volumes for what I mean about one having to read through the FUD and understand that as technology goes forward and is cutting edge it will be RMBS.

VP4: total dog or really cooking?
By: Andrew Thomas
Posted: 24/11/2000 at 14:08 GMT

Is Pentium 4 any good?

Some say no because its FPU doesn't have enough grunt. Others say yes because that FPU is optimised for 144 new SSE2 instructions and performs extremely well - when code has been optimised to use them.

What's been missing up to now has been a before and after example of a real world application showing what difference SSE2 optimised code makes.

Reader John Welter of North West Group, a Canadian Geomatics firm specialising in orthophotography - stretching accurate photographs of the Earth's surface over elevation models of the same area - volunteered us some interesting information on his company's experiences with an early P4 system.

When using the original code, a P4 system took a glacial 19 hours compared with just under 13 hours for a 933MHz PIII. But with code recompiled to use SSE2, the P4 galloped through the test in a shade over seven and a half hours.

"It all comes down to the fact that running today's code the P4 is a dog," Welter told The Reg. "But once the code is optimised for it then it really can wake up and perform quite nicely.

Outperforming Alpha
"A P4 at 1.5Ghz is now faster when running optimised code then our Alpha production boxes by a sizable margin, where those same Alpha boxes outperformed all our P3 based systems.

"Intel did not take the x87 FPU performance as a prime design goal in the P4. They focused on the SSE/SSE2 unit much more and made sacrifices to the X87 FPU side of things to gain more SSE2 performance. Some may argue this was a bad trade-off but the improvements they have managed on the SSE2 are very impressive.

"Geomatics is extremely CPU intensive and pretty much 100 per cent bound by CPU performance. For this reason we obtained an early 1.5GHz P4 despite the inflated costs in an attempt to determine how much added performance it would give us in reducing our production times.

Staggering
"The results are a bit staggering and maybe of interest to you: Baseline: Intel OR840, PIII-933, 1GB RDRAM (4 x 256MB, 800MHz), 144Gb of RAID0 storage (4 x 36GB 10,000rpm U160 SCSI drives off an Adaptec 29160 controller)

"Process the "Calgary" test data set on this machine using original binary: 12.8 Hrs.

"Intel 850 motherboard, P4-1.5GHz, rest of system exactly the same as above. Process the "Calgary" test data set on this machine using original binary: 19.4 Hrs.

"Process the "Calgary" test data set on this machine using a recompiled P4 optimised binary (Intel's V5 compiler plug in for Visual Studio): 7.6 Hrs. (All testing was done under Windows 2000 with SP1.)

"As you can see once SSE2 optimisation is enabled on the P4 it can really cook performance-wise. But, when using the old X87 FPU instructions it is a total dog that even a Celeron could possibly outperform.

"It's too bad Intel did not keep X87 FPU performance as a prime goal and improve it as well as SSE2 as it would have really helped out with legacy code that can't easily be optimised. By not doing this the P4 is a processor for 'new' applications and not a good solution for legacy applications."

Screaming Sindy's second set of extensions
SSE2 extends the SIMD capabilities that MMX technology and SSE provided by adding 144 new instructions including 128-bit SIMD integer arithmetic and 128-bit SIMD double-precision floating-point operations.

The aim of the new instructions is to reduce the overall number of instructions required to execute a particular program task and as a result can contribute to an overall performance increase. They can accelerate a broad range of applications, including video, speech, and image, photo processing, encryption, financial, engineering and scientific applications.

The Single Instruction Multiple Data (SIMD) integer introduced with MMX has been extended from 64 bit to 128 bit registers, which doubles the effective execution rate of the SIMD integer type operations.

In addition to the new SSE2 instructions, the original (Katmai) SSE instructions have been enhanced to support arithmetic operations on multiple data types including double and quad words. SSE2 instructions are principally-aimed at providing better performance when running software such as MPEG-2, MP3 and 3D graphics.

Intel released new compilers a few weeks ago New compilers for P4, Itanic adding support for P4 and SSE2.