Qualcomm Moderated Thread - please read rules before posting Message Board

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Qualcomm Moderated Thread - please read rules before posting

QCOM 171.54

+0.4%

Nov 10 3:59 PM EST

Public Reply Prvt Reply Mark as Last Read File

Previous 10 Next 10 Previous Next

Recommended by:

To: Jim Mullens who wrote (191844)

1/5/2025 7:58:24 PM

From: Jim Mullens

4 Recommendations of 196589

ADD>>>>> QCOM SD X Architecture Deep Dive:............................................

Actually there were several parts to the Deep Drive article I posted earlier.

The rest of the story, again, many graphics at the link

The Qualcomm Snapdragon X Architecture Deep Dive: Getting To Know Oryon and Adreno X1

Part 1- The Qualcomm Snapdragon X Architecture Deep Drive- Getting to know Oryon and AdrenoXI

Part 2- Oryon CPU Architecture: One Well Engineered Core For ALL <<< Previously Posted.

Part 3- Adreno X1 GPU Architecture: A More Familiar Face

Part 4- Performance Promises and First Thoughts
>>>>>>>>>>>>>>>>>>>

Part 1- Getting To Know Oryon and Adreno X1

The Qualcomm Snapdragon X Architecture Deep Dive Oryon CPU Architecture: One Well-Engineered Core For All Adreno X1 GPU Architecture: A More Familiar Face Performance Promises and First Thoughts

The Qualcomm Snapdragon X Architecture Deep Dive: Getting To Know Oryon and Adreno X1

The curtains are drawn and it’s almost showtime for Qualcomm and its Snapdragon X SoC team. After first detailing the SoC nearly 8 months ago at the company’s most recent Snapdragon Summit, and making numerous performance disclosures in the intervening months, the Snapdragon X Elite and Snapdragon X Plus launch is nearly upon us. The chips have already shipped to Qualcomm’s laptop partners, and the first laptops are set to ship next week.

In the last 8 months Qualcomm has made a lot of interesting claims for their high-performance Windows-on-Arm SoC – many of which will be put to the test in the coming weeks. But beyond all the performance claims and bluster amidst what is shaping up to be a highly competitive environment for PC CPUs, there’s an even more fundamental question about the Snapdragon X that we’ve been dying to get to: how does it work?

Ahead of next week’s launch, then, we’re finally getting the answer to that, as today Qualcomm is releasing their long-awaited architectural disclosure on the Snapdragon X SoC. This includes not only their new, custom Arm v8 “Oryon” CPU core, but also technical disclosures on their Adreno GPU, and the Hexagon NPU that backs their heavily-promoted AI capabilities. The company has made it clear in the past that the Snapdragon X is a serious, top-priority effort for the company – that they’re not just slapping together a Windows SoC from their existing IP blocks and calling it a day – so there’s a great deal of novel technology within the SoC.

And while we’re excited to look at it all, we’ll also be the first to admit that we’re the most excited to finally get to take a deep dive on Oryon, Qualcomm’s custom-built Arm CPU cores. The first new high-performance CPU design created from scratch in the last several years, the significance of Oryon cannot be overstated. Besides providing the basis of a new generation of Windows-on-Arm SoCs that Qualcomm hopes will vault them into contention in the Windows PC marketplace, Oryon will also be the basis of Qualcomm’s traditional Snapdragon mobile handset and tablet SoCs going forward.

So a great deal of the company’s hardware over the next few years is riding on this CPU architecture – and if all goes according to plan, there will be many more generations of Oryon to follow. One way or another, it’s going to set Qualcomm apart from its competitors in both the PC and mobile spaces, as it means Qualcomm is moving on from Arm’s reference designs, which by their very nature are accessible Qualcomm’s competition as well.

So without further ado, let’s dive in to Qualcomm’s Snapdragon X SoC architecture.

Setting The Stage: Elite, Plus, & Currently Announced SKUs

As a quick refresher, Qualcomm has announced 4 Snapdragon X SKUs thus far, all of which have been made available to device manufacturers for next week’s launch.

Qualcomm Snapdragon X (Gen 1) Processors
AnandTech	CPU Cores	All Core Max Turbo	Two Core Max Turbo	GPU TFLOPS	NPU TOPS	Total Cache (MB)	Memory
Snapdragon X Elite
X1E-84-100	12	3.8 GHz	4.2 GHz	4.6	45	42	LPDDR5X-8448
X1E-80-100	12	3.4 GHz	4.0 GHz	3.8	45	42	LPDDR5X-8448
X1E-78-100	12	3.4 GHz	3.4 GHz	3.8	45	42	LPDDR5X-8448
Snapdragon X Plus
X1P-64-100	10	3.4 GHz	3.4 GHz	3.8	45	42	LPDDR5X-8448

Show more

Three of these are “Elite” SKUs, which are defined by their inclusion of 12 CPU cores. Meanwhile Qualcomm is offering a single “Plus” SKU (thus far), which cuts that down to 10 CPU cores.

Officially, Qualcomm isn’t assigning any TDP ratings to these chip SKUs, as, in principle, any given SKU can be used across the entire spectrum of power levels. Need to fit in a top-tier chip in a fanless laptop? Just turn down the TDP to match your power/cooling capabilities. That said, to hit the highest clockspeed and performance targets of Qualcomm’s chips, a good bit of cooling and power delivery are required. And to that end we aren’t likely to see X1E-84-100 show up in fanless devices, for example, as its higher clockspeeds would largely be wasted by a lack of thermal headroom. This won’t stop lower-performance chips from being used in bigger devices as budget options, but the SKU table can also be considered as being roughly sorted by TDP.

And while not the subject of today’s disclosure, don’t be surprised to see further Snapdragon X chip SKUs further down the line. It’s become a poorly kept secret that Qualcomm has at least one further Snapdragon X die in development – a smaller die with presumably fewer CPU and GPU cores – which we expect would make up a more budget-focused set of SKUs farther down the line. But for now, Qualcomm is starting with their big silicon, and consequently their highest-performing options.

Even though the first Snapdragon X devices won’t reach consumers until next week, it’s already clear that, judging by OEM adoption, this is going to be Qualcomm’s most successful Windows-on-Arm SoC to date. The difference in adoption compared to the Snapdragon 8cx Gen 3 is practically night and day; Qualcomm’s PC partners have already developed over a dozen laptop models using the new chips, whereas the last 8cx could be found in all of two designs. So with Microsoft, Dell, HP, Lenovo, and others all producing Snapdragon X laptops, the Snapdragon X ecosystem is starting off much stronger than any Windows-on-Arm offering before it.

Snapdragon Compute (Windows-on-Arm) Silicon
AnandTech	Snapdragon X Elite	Snapdragon 8cx Gen 3	Snapdragon 8cx Gen 2	Snapdragon 8cx Gen 1
Prime Cores	12x Oryon 3.80 GHz 2C Turbo: 4.2GHz	4x C-X1 3.00 GHz	4 x C-A76 3.15 GHz	4 x C-A76 2.84 GHz
Efficiency Cores	N/A	4x C-A78 2.40 GHz	4 x C-A55 1.80 GHz	4 x C-A55 1.80 GHz
GPU	Adreno X1	Adreno 8cx Gen 3	Adreno 690	Adreno 680
NPU	Hexagon 45 TOPS (INT8)	Hexagon 8cx Gen 3 15 TOPS	Hexagon 690 9 TOPS	Hexagon 690 9 TOPS
Memory	8 x 16-bit LPDDR5x-8448 135GB/sec	8 x 16-bit LPDDR4x-4266 68.3 GB/sec	8 x 16-bit LPDDR4x-4266 68.3 GB/sec	8 x 16-bit LPDDR4x-4266 68.3 GB/sec
Wi-Fi	Wi-FI 7 + BE 5.4 (Discrete)	Wi-Fi 6E + BT 5.1	Wi-Fi 6 + BT 5.1	Wi-Fi 5 + BT 5.0
Modem	Snapdragon X65 (Discrete)	Snapdragon X55/X62/X65 (Discrete)	Snapdragon X55/X24 (Discrete)	Snapdragon X24 (Discrete)
Process	TSMC N4	Samsung 5LPE	TSMC N7	TSMC N7

Show more

A big part of that, no doubt, comes down to the strength of Qualcomm’s architecture. The Snapdragon X packs what Qualcomm promotes as a vastly more powerful CPU than the Cortex-X1 core found on the most recent (circa 2022) 8cx chip, and it’s being built on a highly competitive process with TSMC’s N4 node. So if all of the stars are properly aligned, the Snapdragon X chips should be a massive step up for Qualcomm.

Meanwhile, there are two other pillars that are helping to hold up this launch. The first, of course, is AI, with the Snapdragon X being the first Copilot+ capable SoC for use with Windows. Requiring a 40+ TOPS NPU, the 45 TOPS Hexagon NPU in the Snapdragon X makes the SoC the first such chip to offer this much performance for neural network and other model inference. The second pillar, in turn, is power. Qualcomm is promising nothing short of amazing battery runtimes with their SoC, leveraging their years of experience producing mobile SoCs. And if they can deliver on it while also hitting their performance goals – allowing users to have their cake and eat it, too – then it will setup the Snapdragon X chips and the resulting laptops nicely.

Ultimately, Qualcomm is looking for their Apple Silicon moment – a repeat of the performance and battery life gains that Apple reaped when switching from Intel’s x86 chips to their own custom Arm-based Apple Silicon. And partner Microsoft, for their part, really, really wants a MacBook Air competitor in the PC ecosystem. It’s a tall order, not the least of which is because neither Intel or AMD have been sitting still over the past few years, but it’s not out of reach.

With that said, Qualcomm and the Windows-on-Arm ecosystem do face some obstacles that means Snapdragon X’s launch trajectory can never quite match Apple’s. Besides the obvious lack of a single unified party developing the hardware and software ecosystem (and all but shoving developers forward to produce software for it), Windows comes with the expectations of backwards compatibility and the legacy baggage that entails. Microsoft, for its part, has continued to work on their x86/x64 emulation layer, which now goes by the name Prism, and the Snapdragon X launch will be the first time it really gets put to the test. But even with years of Arm support within Windows, the software ecosystem is still slowly taking shape, so Snapdragon X will be more reliant on x86 emulation than Apple ever was. Windows and macOS are very distinct operating systems, both in terms of their histories and their owners’ development philosophies, and this is going to be especially apparent in the first years of the Snapdragon X’s lifetime.

3 Adreno X1 GPU Architecture: A More Familiar Face

The Qualcomm Snapdragon X Architecture Deep Dive Oryon CPU Architecture: One Well-Engineered Core For All Adreno X1 GPU Architecture: A More Familiar Face Performance Promises and First Thoughts

Shifting gears, let’s talk about the Snapdragon X SoC’s GPU architecture: Adreno X.

Unlike the Oryon CPU cores, Adreno X1 is not a wholly new hardware architecture. In fact, with 3 generations of 8cx SoCs before it, it’s not even new to Windows. Still, Qualcomm has been notoriously tight-lipped about their GPU architectures over the years, so the GPU architecture may as well be new to AnandTech readers. Suffice it to say, I’ve been trying to get a detailed disclosure from Qualcomm for over a decade at this point, and with Snapdragon X, they’re finally delivering.

At a high level, the Adreno X1 GPU architecture is the latest revision of Qualcomm’s ongoing series of Adreno architectures, with the X1 representing the 7th generation. Adreno itself is based on an acquisition from ATI over 15 years ago (Adreno is an anagram of Radeon), and over the years Qualcomm’s Adreno architecture has more often than not been the GPU to beat in the Android space.

Things are a bit different in the Windows space, of course, as discrete GPUs push integrated GPUs well off to the side for workloads that absolutely need high GPU performance. And because game development has never become fully divorced from GPU architectures/drivers, Qualcomm’s miniscule presence in the Windows market over the years has led to them going often overlooked by game developers. Still, Qualcomm isn’t new to the Windows game, which gives them a leg-up as they try to move into taking a larger share of the Windows market.

From a feature standpoint, the Adreno X1 GPU architecture is unfortunately a bit dated compared to contemporary x86 SoCs. While the architecture does support ray tracing, the chip isn’t able to support the full DirectX 12 Ultimate (feature level 12_2) feature set. And that means it must report itself to DirectX applications as a feature level 12_1 GPU, which means most games will restrict themselves to those features.

That said, Adreno X1 does support some advanced features, which are already being actively used on Android where DirectX’s feature levels do not exist. As previously noted, there is ray tracing support, and this is exposed on Windows applications via the Vulkan API and its ray query calls. Given how limited Vulkan use is on Windows, Qualcomm understandably doesn’t go into the subject in too much depth; but it sounds like Qualcomm’s implementation is a level 2 design with hardware ray testing but no hardware BVH processing, which would make it similar in scope to AMD’s RDNA2 architecture.

DirectX 12 Feature Levels
	12_2 (DX12 Ult.)	12_1	12_0
Ray Tracing (DXR 1.1)	Yes	No	No
Variable Rate Shading (Tier 2)	Yes	No	No
Mesh Shaders	Yes	No	No
Sampler Feedback	Yes	No	No
Conservative Rasterization	Yes	Yes	No
Raster Order Views	Yes	Yes	No
Tiled Resources (Tier 2)	Yes	Yes	Yes
Bindless Resources (Tier 2)	Yes	Yes	Yes
Typed UAV Load	Yes	Yes	Yes

Otherwise, variable rate shading (VRS) tier 2 is supported as well, which is critical for optimizing shader workloads on mobile GPUs. So it appears the missing features holding the X1 back from DirectX 12 Ultimate support are mesh shaders and sampler feedback, which are admittedly some pretty big hardware changes.

In terms of API support, as previously noted, the Adreno X1 GPU supports DirectX and Vulkan. Qualcomm offers native drivers/paths for DirectX 12 and DirectX 11, Vulkan 1.3, and OpenCL 3.0. The only notable exception here is DirectX 9 support, which like fellow SoC vendor Intel, is implemented using D3D9on12, Microsoft's mapping layer for translating DX9 commands to DX12. DX9 games are far and few these days (the API was supplanted by DX10/11 over 15 years ago), but since this is Windows, backwards compatibility is an ongoing expectation.

Conversely, on the compute front Microsoft’s new DirectML API for low-level GPU access for machine learning is supported. Qualcomm even has optimized metacommands written for the GPU, so that software tapping into DirectML can run more efficiently without knowing anything else about the architecture.

Adreno X1 GPU Architecture In Depth

High-level functionality aside, let’s take a look at the low-level architecture.

The Adreno X1 GPU is split up into 6 shader processor (SP) blocks, each offering 256 FP32 ALUs for a total of 1536 ALUs. With a peak clockspeed of 1.5GHz, this gives the integrated GPU on Snapdragon X a maximum throughput of 4.6 TFLOPS (with lesser amounts for lower-end SKUs).

Split up into the traditional front-end/SP/back-end setup we see with other GPUs, the front-end of the GPU handles triangle setup and rasterization, as well as binning for the GPU’s tile-based rendering mode. Of note, the GPU front-end can setup and raster 2 tringles per clock, which is not going to turn any heads in the PC space in 2024, but is respectable for an integrated GPU. Boosting its performance, the front-end can also do early depth testing to reject polygons that will never be visible before they are even rasterized.

Meanwhile the back-end is made from 6 render output units (ROPs), which can process 8 pixels per cycle each, for a total of 48 pixels/clock rendered. The render back-ends are plugged in to a local cache, as well as an important scratchpad memory that Qualcomm calls GMEM (more on this in a bit).

The individual shader processor blocks themselves are relatively customary, especially if you’ve seen an NVIDIA GPU architecture diagram. Each SP is further subdivided into two micro-pipes (micro shader pipe texture pipe, or uSPTP), which is helmed by its own dedicated scheduler and other resources such as local memory, load/store units, and texture units.

Each uSPTP offers 128 FP32 ALUs. And, in a bit of a surprise, there is also a separate set of 256 FP16 ALUs, meaning that Adreno X1 doesn’t have to share resources when processing FP16 and FP32 data, unlike architectures which execute FP16 operations on FP32 ALUs. Though for good measure, the FP32 units can be used for FP16 operations as well, if the GPU scheduler determines it’s needed.

Finally, there are 16 elementary functional units (EFUs), which handle transcendental functions such as LOG, SQRT, and other rare (but important) mathematical functions.

Surprisingly here, the Adreno X1 uses a rather large wavefront size. Depending on the mode, Qualcomm uses either 64 or 128 lane wide waves, with Qualcomm telling us that they typically use 128-wide wavefronts for 16bit operations such as fragment shaders, while 64-wide wavefronts are used for 32bit operations (e.g. pixel shaders).

Comparatively, AMD’s RDNA architectures use 32/64 wide wavefronts, and NVIDIA’s wavefronts/warps are always 32 wide. Wide designs have fallen out of favor in the PC space due to the difficulty in keeping them fed (too much divergence), so this is interesting to see. Ultimately, despite the usual wavefront size concerns, it seems to be working well for Qualcomm given the high GPU performance of their smartphone SoCs – no small feat given the high resolution of phone screens.

ALUs aside, each uSPTP includes their own texture units, which are capable of spitting out 8 texels per clock per uSPTP. There’s also limited image processing functionality here, including texture filtering, and even SAD/SAS instructions for generating motion vectors.

Finally, there’s quite a bit of register space within each uSPTP. Along with the L1 texture cache, there’s a total of 192KB of general purpose registers to keep the various blocks fed and to try to hide latency bubbles in the wavefronts.

As noted earlier, the Adreno X1 supports multiple rendering modes in order to get the best performance possible, which the company calls their FlexRender technology. This is a subject that doesn’t come up too often with PC GPU designs, but is of greater importance in the mobile space for historical and efficiency reasons.

Besides the traditional direct/immediate mode rendering method (the typical mode for most PC GPUs), Qualcomm also supports tile-based rendering, which they call binned mode. As with other tile-based renderers, binned mode splits a screen up into multiple tiles, and then renders each one separately. This allows the GPU to only work on a subset of data at once, keeping most of that data in its local caches and minimizing the amount of traffic that goes to DRAM, which is both power-expensive and performance-constricting.

And finally, Adreno X1 has a third mode that combines the best of binned and direct rendering, which they call binned direct mode. This mode runs a binned visibility pass before switching to direct rendering, as a means to further cull back-facing (non-visible) triangles so that they don’t get rastered. Only after that data is culled does the GPU then switch over to direct rendering mode, now with a reduced workload.

Key to making the binned rendering modes work is the GPU’s GMEM, a 3MB SRAM block that serves as a very high bandwidth scratch pad for the GPU. Architecturally, GMEM is more than a cache, as it’s decoupled from the system memory hierarchy, and the GPU can do virtually anything it wants with the memory (including using it as a cache, if need be).

At 3MB in size, the GMEM block is not very large overall. But that’s big enough to store a tile – and thus prevent a whole lot of traffic from hitting the system memory. And it’s fast, too, with 2.3TB/second of bandwidth, which is enough bandwidth to allow the ROPs to run at full-tilt without being constrained by memory bandwidth.

With the GMEM block in place, in an ideal scenario, the GPU only needs to write out to the DRAM once per title, when it finishes rendering said tile. In practice, of course, there ends up being more DRAM traffic than that, but this is one of Qualcomm’s key features for avoiding chewing up memory bandwidth and power with GPU writes to DRAM.

And when the Adreno X1 does need to go to system memory, it will go through its own remaining caches, before finally reaching the Snapdragon X’s shared memory controller.

Above the GMEM, there is a 128KB cluster cache for each pair of SPs (for 384KB in total for a full Snapdragon X). And above that still is a 1MB unified L2 cache for the GPU.

Finally, this leaves the system level cache (L3/SLC), which serves all of the processing blocks on the GPU. And when all else fails, there is the DRAM.

On a concluding note, while Qualcomm doesn’t have a dedicated slide for this, it’s interesting to note that the Adreno X1 GPU also includes a dedicated RISC controller within the GPU which serves as a GPU Management Unit (GMU). The GMU provides several features, the most important of which is power management within the GPU. The GMU works in concert with power management requests elsewhere in the SoC, allowing the chip to reallocate power between the different blocks depending on what the SoC decides is the most performant allocation method.

4. Performance Promises and First Thoughts

Wrapping things up, let’s touch upon a couple of Qualcomm’s performance slides before closing out this architectural deep dive. While the whole world will get to see what the Snapdragon X can do first-hand next week when retail devices launch, until then it gives us a bit more insight into what to expect. Just be sure to take it with the requisite grain of salt.

On the CPU side of matters, Qualcomm is claiming that the Snapdragon X Elite can beat the entire field of contemporary PC competitors in GeekBench 6.2 single-threading. And by a significant degree, too, when power efficiency is taken into account.

In short, Qualcomm claims that the Oryon CPU core in the Snapdragon X Elite can beat both Redwood Cove (Meteor Lake) and Zen 4 (Phoenix) in absolute performance, even if the x86 cores are allowed unrestricted TDPs. With mobile x86 chips turboing as high as 5GHz it’s a bold claim, but not out of the realm of possibility.

Meanwhile on the GPU front, Qualcomm is making similar energy efficiency gains. Though the workload in question – 3DMark WildLife Extreme – is not likely to translate into most games, as this is a mobile-focused benchmark that has long been optimized to heck and back within every mobile SoC vendor’s drivers.

Performance benchmarks using actual games are arguably more useful here. And even though Qualcomm is probably doing some cherry-picking, the top Snapdragon X SKU is often trading blows with Intel’s Core Ultra 7 155H. It admittedly makes for a less impressive showing overall, but it’s good to see where Qualcomm is currently landing on real games. And in this case, even just a mix of ties/beats of one of Intel’s better mobile chips is not a bad showing.

First Thoughts

And there you have it, our first deep dive into a Qualcomm Snapdragon X SoC architecture. With Qualcomm investing into the Windows-on-Arm ecosystem for the long haul, this will hopefully be the first of many, as the company seeks to become the third major Windows CPU/SoC vendor.

But the ultimate significance of the Snapdragon X SoC and its Oryon CPU cores goes beyond just mere SoCs for PC laptops. Even if Qualcomm is wildly successful here, the number of PC chips they’ll ship will be a drop in the bucket compared to their true power base: the Android SoC space. And this is where Oryon is going to be lighting the way to some significant changes for Qualcomm’s mobile SoCs.

As noted by Qualcomm since the start of their Oryon journey, this is ultimately the CPU core that will be at the heart of all of Qualcomm’s products. What starts this month with PC SoCs will eventually grow to include mobile SoCs like the Snapdragon 8 series, and farther along still will be Qualcomm’s automotive products, and high-end offshoots like their XR headset SoCs. And while I doubt we’ll really see Oryon and its successors in Qualcomm’s product in a true top-to-bottom fashion (the company needs small and cheap CPU cores for their budget lines like Snapdragon 6 and Snapdragon 4), there is no doubt that it’s going to become a cornerstone of most of their products over the long run. That’s the value of differentiation of making your own CPU core – and getting the most value out of that CPU core by using it in as many places as possible.

Ultimately, Qualcomm has spent the last 8 months hyping up their next-generation PC SoC and its bespoke CPU core, and now it’s time for all of the pieces to fall into place. The prospect of having a third competitor in the PC CPU space – and an Arm-baesd one at that – is exciting, but slideware and advertising aren’t hardware and benchmarks. So we’re eagerly awaiting what next week will bring, and seeing if Qualcomm’s engineering prowess can live up to the company’s grand ambitions.

Report TOU Violation

Share This Post

Public Reply Prvt Reply Mark as Last Read File

Previous 10 Next 10 Previous Next