SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: dougSF30 who wrote (225472)2/6/2007 6:02:33 PM
From: mas_Read Replies (1) | Respond to of 275872
 
Intel has a 3GHz Clovertown they can deploy whenever they choose.

and where do you get this wonderful piece of information from ? You do know all current 3 GHz Woodcrests are 80W ?



To: dougSF30 who wrote (225472)2/7/2007 4:21:03 AM
From: DDB_WORead Replies (1) | Respond to of 275872
 
Doug - I made a short list of IPC improvements, which we'll see coming with K10 ("Barcelona"). But first let me answer your points:

1. AMD won't have any 120W parts this summer, if DT is correct. They show them entering production in Q3, so I wouldn't expect them before October.

That's something, where we have to rely on sites like DT.

2. That aside, Intel has a 3GHz Clovertown they can deploy whenever they choose.

This is also a question of production volume. If AMD adapts the power virus based TDP definition, then they'll yield some more parts fitting into this new envelope.

3. Finally, I expect that Core2 will maintain an integer advantage clock/clock. Combined with a ~20% clock advantage, one can understand why Otellini is comfortable stating that Intel will maintain performance leadership. (And late this year, Penryn 45nm upgrades arrive, followed by the Nehalem death-blow in H208.)

We don't know, what Otellini knows. But different motivations might lead to the same behaviour (emotional intelligence). One could be, that Otellini liked the market share gains in the server market and so he doesn't want to let his customers dive into uncertainty.

However, regarding IPC.. Have a look at this list and think about the design efforts and if they'd be worth it, if most of the more general changes wouldn't cause IPC improvements of ~1% or more per core modification?

* Comprehensive Upgrades for SSE
- Dual 128-bit SSE dataflow
- Up to 4 dual precision FP OPS/cycle
- Dual 128-bit loads per cycle
- Can perform SSE MOVs in the FP “store” pipe
- Execute two generic SSE ops + SSE MOV each cycle (+ two 128-bit SSE loads)
- FP Scheduler can hold 36 Dedicated x 128-bit ops
- SSE Unaligned Load-Execute mode
Remove alignment requirements for SSE ld-op instructions
Eliminate awkward pairs of separate load and compute instructions
To improve instruction packing and decoding efficiency
* Advanced branch prediction
- Dedicated 512-entry Indirect Predictor
- Double return stacksize
- More branch history bits and improved branch hashing
* 32B instruction fetch
- Benefits integer code too
- Reduced split-fetch instruction cases
* Sideband Stack Optimizer
- Perform stack adjustments for PUSH/POP operations “on the side”
- Stack adjustments don’t occupy functional unit bandwidth
- Breaks serial dependence chains for consecutive PUSH/POPs
* Out-of-order load execution
- New technology allows load instructions to bypass:
Other loads
Other stores which are known not to alias with the load
- Significantly mitigates L2 cache latency
* TLB Optimisations
- Support for 1G pages
- 48bit physical address
- Larger TLBs key for:
Virtualized workloads
Large-footprint databases and
transaction processing
- DTLB:
Fully-associative 48-way TLB (4K, 2M, 1G)
Backed by L2 TLBs: 512 x 4K, 128 x 2M
- ITLB:
16 x 2M entries
* Data-dependent divide latency
* More Fastpath instructions
– CALL and RET-Imm instructions
– Data movement between FP & INT
* Bit Manipulation extensions
- LZCNT/POPCNT
* SSE extensions
- EXTRQ/INSERTQ,
- MOVNTSD/MOVNTSS
* Independent DRAM controllers
- Concurrency
- More DRAM banks reduces page conflicts
- Longer burst length improves command efficiency
* Optimized DRAM paging
- Increase page hits
- Decrease page conflicts
* History-based pattern predictor
* Re-architect NB for higher BW
- Increase buffer sizes
- Optimize schedulers
- Ready to support future DRAM technologies
* Write bursting
- Minimize Rd/Wr Turnaround
* DRAM prefetcher
- Track positive and negative, unit and non-unit strides
- Dedicated buffer for prefetched data
- Aggressively fill idle DRAM cycles
* Core prefetchers
- DC Prefetcher fills directly to L1 Cache
- IC Prefetcher more flexible
2 outstanding requests to any address
* Shared L3
- Victim-cache architecture maximizes efficiency of cache hierarchy
- Fills from L3 leave likely shared lines in the L3
- Sharing-aware replacement policy