Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?

To: kapkan4u who wrote (19235)	11/15/2000 3:43:18 PM
From: jcholewa	Respond to of 275872

> FXCH is not free on P4. I think we're talking different tracks here. I'm comparing the P4's SSE2 capabilities against the x87 stack in other processors. I was stemming this largely from the discussion seed of how the P4 attained scores much greater than those offered by other processors. > I don't want to argue the advantages of registers over stack. Seems pretty basic to me. As far as I know, a 2-op flat can, in one instruction, take any two registers, perform an operation with their values, and (in the case of SSE2, I think?) place the resultant value into one of the two given registers. The x87 stack, or at least one that isn't crippled with a costly FXCH, can perform an operation on the register at the top of the stack and any other register, then push the resultant value onto the top of stack. An FXCH op issued just before this can exchange any register with the top of the stack, so effectively you are performing an operation on any two registers and placing the result into one of the two given registers. I know you do not wish to delve into this, but I would still be happy if you could point out if anything I said above is erroneous. Aside from the decode bottleneck, these two implementations do not seem so amazingly different to me, which is why I am interested in hearing an alternative explanation.       > <PS: Also, you can only do one SSE2 instruction per cycle, so the peak goes down by 50% there.> > 50% comparing to what? P4 only has one FPU so the peak is one scalar (or half packed) SSE/SSE2 > instruction per clock. The advantage for packed comes with 128bit interface to the d-cache. Sorry, I was comparing to packed. Also, I was comparing to the competing x87 stack, which is in the end what we're putting the P4's capabilities up against.     -JC