SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Silicon Graphics, Inc. (SGI) -- Ignore unavailable to you. Want to Upgrade?


To: Edward Smyth who wrote (3632)11/21/1997 10:50:00 AM
From: John M. Zulauf  Read Replies (1) | Respond to of 14451
 
> The IBM SP2 on the other hand can be used

a screw driver can be used as a hammer in a pinch ;-) Seriously though, my understanding of the SP2 was the the programming model required a lot of hand localization, hand parallelization to get anything close to linear subdivision of a problem.

Ease of programming really matters because usually an application or porting team has a fixed time-to-market, and justifiable not-to-exceed budget (NRE) handed them by the product marketing team. This bucket of money can go into "pure porting" (just making it run), "stability" (making it NOT crash, **every** new platform exposes extant bugs in a code), "tuning" (making it run acceptably), "platform specific optimization" (digging deep into the specific architecture features for a possible 2x-10x performance possible from typical supercomputers).

The more time that's spent on porting and stability, the less is left for the latter two, it's a zero sum game. This means that ease of development has a strong bottom-line impact for commercial (and non-commercial applications), and thus for the viability of the platform for specific markets, and the attractiveness of a platform for ISV's.

The ccNUMA support on the SGI O2000 allows alot to be done for you automatically (or far more easily). For example if you use coarse grain parallelism, processes will migrate (or you can assign them) to empty processors, and the needed data will migrate toward the memory on the node board with that processor. For fine grain parallelism, the flat memory space means that simple user space queue/dispatch tables can be built with simple user space function calls

thingToDo = getNextThingToDo(); // Parallelism for Dummys (tm) ;-)

and of course the occasional lock to assure the queue pointers don't get updated twice. I've done a bit a parallel processing with real time data sources running asyncronously with user applications. The difference in effort between "flat memory + locks" and "proprietary message passing scheme" (like the HP and IBM) is huge. With a flat memory model winning 10-1 or 5-1 in terms of level of effort.

This is especially true in C or C++, where a single object pointer often is at the top of a fairly complex memory chain. Copying a C++ object across distinct memory spaces means following that memory chain (thus requiring building multiprocessor support into EVERY object in your application -- or converting all your pointer types to memory address independant "handles" like you had to on the Mac), where as with common address space it's a simple as passing a pointer (and probably updating an "owner" table) or invoking a copy constructor. (a memory independant copy operator would be similar to a copy constructor, except you have to (a) build both a copy constructor AND a (b) copy operator and (c) either (1) develop a "wire protocol" for copied objects (or use CORBA et.al.) or (2) call the multiproc (remember these are proprietary) API for every contiguous memory block of the object... (yuck).

Unofficially (and probably overly technically),

john