| Intel Vs. Samsung Vs. TSMC 
 
 semiengineering.com
 
 Ed Sperling
 
 21–26 minutes
 
 
 
 
 
 The three leading-edge foundries —  Intel,  Samsung, and  TSMC  — have started filling in some key pieces in their roadmaps, adding  aggressive delivery dates for future generations of chip technology and  setting the stage for significant improvements in performance with  faster delivery time for custom designs.
 
 Unlike in the past, when a single industry roadmap dictated how to  get to the next process node, the three largest foundries increasingly  are forging their own paths. They all are heading in the same general  direction with 3D transistors and packages, a slew of enabling and  expansive technologies, and much larger and more diverse ecosystems. But  some key differences are emerging in their methodologies,  architectures, and third-party enablement.
 
 Roadmaps for all three show that transistor scaling will continue at  least into the 18/16/14 angstrom range, with a possible move from  nanosheets and  forksheet FETs, followed by  complementary FETs  (CFETs) at some point in the future. The key drivers are AI/ML and the  explosion of data that needs to be processed, and in most cases these  will involve arrays of processing elements, usually with high levels of  redundancy and homogeneity, in order to achieve higher yields.
 
 In other cases, these designs may contain dozens or hundreds of  chiplets,  some engineered for specific data types and others for more general  processing. Those chiplets can be mounted on a substrate in a  2.5D  configuration, an approach that has gained traction in data centers  because it simplifies the integration of high-bandwidth memory ( HBM),  as well as in mobile devices, which also include other features such as  image sensors, power supplies, and additional digital logic used for  non-critical functions. All three foundries are working on full  3D-ICs,  as well. And there will be hybrid options available, where logic is  stacked on logic and mounted on a substrate, but separated from other  features in order to minimize physical effects such as heat — a  heterogeneous configuration that has been called both 3.5D and 5.5D.
 
 Rapid and mass customization
 One of the biggest changes involves bringing domain-specific designs to  market much more quickly than in the past. Mundane as this may sound,  it’s a competitive necessity for many leading-edge chips, and it  requires fundamental changes in the way chips are designed,  manufactured, and packaged. Making this scheme work demands a  combination of standards, innovative connectivity schemes, and a mix of  engineering disciplines that in the past had limited interactions, if  any.
 
 Sometimes referred to as “mass customization,” it includes the usual  power, performance, and area/cost (PPA/C) tradeoffs, as well as rapid  assembly options. That is the promise of heterogeneous chiplet  assemblies, and from a scaling perspective it marks the next phase of  Moore’s Law. The entire semiconductor ecosystem has been laying the groundwork for this shift incrementally for more than a decade.
 
 But getting heterogeneous chiplets — essentially hardened IP from  multiple vendors and foundries — to work together is both a necessary  and daunting engineering challenge. The first step is connecting the  chiplets together in a consistent way to achieve predictable results,  and this is where the foundries have spent much of their effort,  particularly with the  Universal Chiplet Interconnect Express (UCIe) and  Bunch of Wires  (BoW) standards. While that connectivity is a critical requirement for  all three, it’s also one of the main areas of divergence.
 
 Intel Foundry’s current solution, prior to fully integrated 3D-ICs,  is to develop what industry sources describe as “sockets” for chiplets.  Instead of characterizing each chiplet for a commercial marketplace, the  company defines the specification and the interface so that chiplet  vendors can develop these limited-function mini-chips to meet those  specs. That addresses one of the big stumbling blocks for a commercial  chiplet marketplace. All the pieces need to work together, from data  speed to thermal and noise management.
 
 Intel’s scheme relies heavily on its Embedded Multi-Die Interconnect  Bridge (EMIB), first introduced in 2014. “The really cool thing about an  EMIB base is you can add any amount of chiplets,” said Lalitha  Immaneni, vice president of technology development at Intel. “We don’t  have a limitation on the number of IPs that we can use in design, and it  won’t increase the interposer size, so it’s cost-effective and it’s  agnostic of the process. We have given out a package assembly design  kit, which is like your traditional PDK for the assembly. We give them  the design rules, the reference flows, and we tell them the allowable  constructions. It will also give them any collaterals that we need to  take it into our assembly.”
 
 Depending upon the design, there can be multiple EMIBs in a package,  complemented by thermal interface materials (TIMs), in order to  dissipate heat that can become trapped inside a package. TIMs typically  are pads that are engineered to conduct heat away from the source, and  they are becoming more common as the amount of compute inside a package  increases and as the substrates are thinned to shorten the distance  signals need to travel.
 
 But the thinner the substrate, the less effective it is at heat  dissipation, which can result in thermal gradients that are  workload-dependent and therefore difficult to anticipate. Eliminating  that heat may require TIMs, additional heat sinks, and potentially even  more exotic cooling approaches such as microfluidics.
 
 Both TSMC and Samsung offer bridges, as well. Samsung has embedded  bridges inside the RDL — an approach it calls 2.3D or I-Cube ETM — and  it’s using them to connect sub-systems to those bridges in order to  speed time to working silicon. Instead of relying on a socket approach,  some of the integration work will be pre-done in known-good modules.
 
 “Putting together two, four, or eight CPUs into a system is something  that very sophisticated customers know how to go out and do,” said  Arm  CEO Rene Haas, in a keynote speech at a recent Samsung Foundry event.  “But if you want to build an SoC that has 128 CPUs attached to a neural  network, memory structures, interrupt controllers that interface to an  NPU, an off-chip bus to go to another chiplet, that is a lot of work. In  the last year and a half, we’ve seen a rush of people building these  complex SoCs wanting more from us.”
 
 Samsung also has been building mini-consortia [1] of chiplet  providers, targeted at specific markets. The initial concept is that one  company builds an I/O die, another builds the interconnect, and a third  builds the logic, and when that is proven to work, then others are  added into the mix to provide more choices for customers.
 
 TSMC has experimented with a number of different options, including  both RDL and non-RDL bridges, fan-outs, 2.5D chip-on-wafer-on-substrate  (CoWoS), and System On Integrated Chips (SoIC), a 3D-IC concept in which  chiplets are packed and stacked inside a substrate using very short  interconnects. In fact, TSMC has a process design kit for just about  every application, and it has been active in creating assembly design  kits for advanced packaging, including reference designs to go with  them.
 
 The challenge is that foundry customers willing to invest in these  complex packages increasingly want very customized solutions. To  facilitate that, TSMC rolled out a new language called 3Dblox, a  top-down design scheme that fuses physical and connectivity constructs,  allowing assertions to be applied across both. This sandbox approach  allows customers to leverage any of its packaging approaches — InFO,  CoWoS, and SoIC. It’s also essential to TSMC’s business model, because  the company is the only pure-play foundry of the three [2] — although  both Intel and Samsung have distanced their foundry operations in recent  months.
 
 “We started from a concept of modularization,” said Jim Chang, vice  president of advanced technology and mask engineering at TSMC, in a  presentation when 3Dblox was first introduced in 2023. “We can build a  full 3D-IC stacking with this kind of language syntax plus assertions.”
 
 Chang said the genesis of this was a lack of consistency between the  physical and connectivity design tools. But he added that once this  approach was developed, it also enabled reuse of chiplets in different  designs because much of the characterization was already well-defined  and the designs are modular.
 
 
  Fig. 1: TSMC’s 3Dblox approach. Source: TSMC
 
 Samsung followed with its own system description language, 3DCODE, in  December 2023. Both Samsung and TSMC claim their languages are  standards, but they’re more like new foundry rule decks because it’s  unlikely these languages will be used outside of their own ecosystems.  Intel’s 2.5D approach doesn’t require a new language because the rules  are dictated by the socket specification, trading off some customization  with a shortened time to market and a simpler approach for chiplet  developers.
 
 The chiplet challenge
 Chiplets have obvious benefits. They can be designed independently at  whatever process node makes sense, which is particularly important for  analog features. But figuring out how to put the pieces together with  predictable results has been a major challenge. The initial LEGO-like  architecture scheme floated by DARPA has proven much more complicated  than first envisioned, and it has required a massive and ongoing efforts  by broad ecosystems to make it work.
 
 Chiplets need to be precisely synchronized so that critical data is  processed, stored, and retrieved without delay. Otherwise, there can be  timing issues, in which one computation is either delayed or out-of-sync  with other computations, leading to delays and potential deadlocks. In  the context of mission- or safety-critical applications, the loss of a  fraction of a second can have serious consequences.
 
 Simplifying the design process, particularly with domain-specific  designs where one size does not fit all, is an incredibly complex  endeavor. The goal for all three foundries is to provide more options  for companies that will be developing high-performance, low-power chips.  With an estimated 30% to 35% of all leading-edge design starts now in  the hands of large systems companies such as Google, Meta, Microsoft,  and Tesla, the economics of leading-edge chip and package design have  changed significantly, and so have the PPA/C formulas and tradeoffs.
 
 Chips developed for these systems companies probably will not be sold  commercially. So if they can achieve higher performance per watt, then  the design and manufacturing costs can be offset by lower cooling power  and higher utilization rates — and potentially fewer servers. The  reverse is true for chips sold into mobile devices and commodity  servers, where high development costs can be amortized across huge  volumes. The economics for customized designs in advanced packages work  for both, but for very different reasons.
 
 Scaling down, up, and out
 It’s assumed that within these complex systems of chiplets there will be  multiple types of processors, some highly specialized and others more  general-purpose. At least some of these will likely be developed at the  most advanced process nodes due to limited power budgets. Advanced nodes  still provide higher energy efficiency, which allows more transistors  to be packed into the same area in order to improve performance. This is  critical for AI/ML applications, where processing more data faster  requires more multiply/accumulate operations in highly parallel  configurations. Smaller transistors provide greater energy efficiency,  allowing more processing per square millimeter of silicon, but the gate  structure needs to be changed to prevent leakage, which is why forksheet  FETs and CFETs are on the horizon.
 
 Put simply, process leadership still has value. Being first to market  with a leading-edge process is good for business, but it’s only one  piece of a much larger puzzle. All three foundries have announced plans  to push well into the angstrom range. Intel plans to introduce its 18A  this year, followed by 14A a couple years later.
 
 
  Fig. 2: Intel’s process roadmap. Source: Intel Foundry
 
 TSMC, meanwhile, will add A16 in 2027 (see figure 3, below.)
 
 
  Fig. 3: TSMC’s scaling roadmap into the angstrom era. Source: TSMC
 
 And Samsung will push to 14 angstroms sometime in 2027 with its SF1.4, apparently skipping 18/16 angstroms. (See figure 4)
 
 
  Fig. 4: Samsung’s process scaling roadmap. Source: Samsung Foundry
 
 From a process node standpoint, all three foundries are on the same  track. But advances are no longer tied to the process node alone. The  focus increasingly is about latency and performance per watt in a  specific domain, and this is where stacking logic-on-logic in a true  3D-IC configuration will excel, using hybrid bonds to connect chiplets  to a substrate and each other. Moving electrons through a wire on a  planar die is still the fastest (assuming a signal doesn’t have to  travel from one end of the die to another), but stacking transistors on  top of other transistors is the next best thing, and in some cases even  better than a planar SoC because some vertical signal paths may be  shorter.
 
 In a recent presentation, Taejoong Song, Samsung Foundry’s vice  president of foundry business development, showed a roadmap featuring  logic-on-logic mounted on a substrate, combining a 2nm (SF2) die on top  of a 4nm (SF4X) die, both mounted on top of another substrate. This is  basically a 3D-IC on a 2.5D package, which is the 3.5D or 5.5D concept  mentioned earlier. Song said the foundry will begin stacking an SF1.4 on  top of SF2P, starting in 2027. What’s particularly attractive about  this approach are the thermal dissipation possibilities. With the logic  separated from other functions, heat can be channeled away from the  stacked dies through the substrate or any of the five exposed sides.
 
 
  Fig. 5: Samsung’s 3D-IC architecture for AI. Source: Samsung
 
 Intel, meanwhile, will leverage its Foveros Direct 3D to stack logic  on logic, either face-to-face or face-to-back. The approach allows chips  or wafers from different foundries, with the connection bandwidth  determined by the copper via pitch, according to a new Intel white  paper. The paper noted that the first version would use a copper pitch  of 9µm, while the second generation would use a 3µm pitch.
 
 
  Fig. 6: Intel’s Foveros Direct 3D. Source: Intel
 
 “The true 3D-IC comes with Foveros, and then also with hybrid bonds,”  said Intel’s Immaneni. “You cannot go in the tradition route of design  where you put it together and run validation, and then find, ‘Oops, I  have an issue.’ You cannot afford to do this anymore because you’re  impacting your time to market. So you really want to provide a sandbox  to make it predictable. But even before I step into this detailed design  environment, I want to run my mechanical/electrical/thermal analysis. I  want to look at the connectivity so I don’t have opens and shorts. The  burden for 3D-IC resides more in the code design than the execution.”
 
 Foveros allows an active logic die to be stacked on either another  active or passive die, with the base die used to connect all the die in a  package at a 36 micron pitch. By leveraging advanced sort, Intel claims  it can guarantee 99% known good die, and 97% yield at post-assembly  test.
 
 TSMC’s CoWoS, meanwhile, already is in use by NVIDIA and AMD for  their advanced packaging for AI chips. CoWoS is essentially a 2.5D  approach, using an interposer to connect SoCs and HBM memory using  through-silicon vias. The company’s plans for SoIC are more ambitious,  packaging both memory on logic along with other elements, such as  sensors, in a 3D-IC at the front end of the line. This can significantly  reduce assembly time of multiple layers, sizes, and functions. TSMC  contends that its bonding scheme enables faster and shorter connections  than other 3D-IC approaches. One report said Apple will begin using  TSMC’s SoIC technology starting next year, while AMD will expand its use  of this approach.
 
 Other innovations
 Putting the process and packaging technology in place opens the door to a  much broader set of competitive options. Unlike in the past, when big  chipmakers, equipment vendors, and EDA companies defined the roadmap for  chips, the chiplet world provides the tools for end customers to make  those decisions. This is due, in no small part, to the number of  features that can be put into a package versus those that can fit inside  the reticle limits of an SoC. Packages can be expanded horizontally or  vertically, as needed, and in some cases they can improve performance  just through vertical floor-planning.
 
 But given the vast opportunity in the cloud and the edge —  particularly with the rollout of AI everywhere — the three big  foundries, as well as their ecosystems, are racing to developing new  capabilities and features. In some cases, this involves leveraging what  they already have. In other cases, it requires brand new technologies.
 
 For example, Samsung has started detailing plans about custom HBM,  which includes 3D DRAM stacks with a configurable logic layer  underneath. This is the second time around for this approach. Back in  2011, Samsung and Micron co-developed the Hybrid Memory Cube, packaging a  DRAM stack on a layer of logic. HBM won the war after JEDEC turned it  into a standard, and HMC largely disappeared. But there was nothing  wrong with the HMC approach, other than perhaps bad timing.
 
 In its new form, Samsung plans to offer customized HBM as an option.  Memory is one of the key elements that determine performance, and the  ability to read/write and move data back and forth more quickly between  memory and processors can have a big impact on performance and power.  And those numbers can be significantly better if the memory is  right-sized to a specific workload or data type, and if some of the  processing can be done inside the memory module so there is less data to  move.
 
 
  Fig. 7: Samsung roadmap and innovations. Source: Semiconductor Engineering/MemCon 2024
 
 Intel, meanwhile, has been working on a better way to deliver power  to densely packed transistors, a persistent problem as the transistor  density and number of metal layers increases. In the past, power was  delivered from the top of the chip down, but two problems have emerged  at the most advanced nodes. One is the challenge of actually delivering  enough power to every transistor. The second is noise, which can come  from power, substrates, or electromagnetic interference. Without proper  shielding — something that is becoming more difficult at each new node  due to thinner dielectrics and wires — that noise can impact signal  integrity.
 
 Delivering power through the backside of a chip minimizes those kinds  of issues and reduces wiring congestion. But it also adds other  challenges, such as how to drill holes through a thinner substrate  without structural damage. Intel apparently has solved these issues,  with plans to offer its PowerVia backside power scheme this year.
 
 TSMC said it plans to deliver backside power delivery at A16 in  2026/2027. Samsung is roughly on the same schedule, delivering it in the  SF2Z 2nm process.
 
 Intel also has announced plans for glass substrates, which can  provide better planarity and lower defectivity than CMOS. This is  especially important at advanced nodes, where even nano-sized pits can  cause issues. As with backside power delivery, handling issues abound.  The upside is that glass has the same coefficient of thermal expansion  as silicon, so it is compatible with the expansion and contraction of  silicon components, such as chiplets. After years of sitting on the  sidelines, glass is suddenly very attractive. In fact, both TSMC and  Samsung are working on glass substrates, as well, and the whole industry  is starting to design with glass, handle it without cracking it, and to  inspect it.
 
 TSMC, meanwhile, has focused heavily on building an ecosystem and  expanding its process offerings. Numerous industry sources say TSMC’s  real strength is the ability to deliver process development kits for  just about any process or package. The foundry produces about 90% of the  most advanced chips globally, according to Nikkei. It also has the most  experience with advanced packaging of any foundry, and the largest and  broadest ecosystem, which is important.
 
 That ecosystem is critical. The chip industry is so complex and  varied that no single company can do everything. The question going  forward will be how complete those ecosystems truly are, particularly if  the number of processes continues to grow. For example, EDA vendors are  essential enablers, and for any process or packaging approach to be  successful, design teams need automation. But the more processes and  packaging options, the more difficult it will be for EDA vendors to  support every incremental change or improvement, and potentially the  greater the lag time between announcement and delivery.
 
 Conclusion
 The recent supply chain glitches and geopolitics have convinced the  United States and Europe that they need to re-shore and “friend-shore”  manufacturing. The investments in semiconductor fabs, equipment, tools,  and research are unprecedented. How that affects the three largest  foundries remains to be seen, but it certainly is providing some of the  impetus behind new technologies such as co-packaged optics, a raft of  new materials, and cryogenic computing.
 
 The impact of all of these changes on market share is becoming harder  to track. It’s no longer about which foundry is producing chips at the  smallest process node, or even the number of chips being shipped. A  single advanced package may have dozens of chiplets. The real key is the  ability to deliver solutions that matter to customers, quickly and  efficiently. In some cases the driver will be performance per watt,  while in others it may be time to results with power as a secondary  consideration. And in still others, it may be a combination of features  that only one of the leading-edge foundries can provide in sufficient  quantity. But what is clear is that the foundry race is significantly  more complex than ever before, and becoming more so. In this highly  complex world, simple metrics for comparison no longer apply.
 
 References
 1.  Mini-Consortia Forming Around Chiplets, March 20, 2023; E. Sperling/Semiconductor Engineering
 2. TSMC also is the largest shareholder (35%) in Global Unichip Corp., a design services company.
 
 
 
 
 
 
 |