SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : ASML Holding NV
ASML 1,056+1.9%Dec 19 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
From: BeenRetired11/18/2025 9:39:44 AM
   of 42746
 
UCM:
Smart Storage Full Speed Ahead For AI | Watch

Copilot:
Huawei's Unified Cache Manager (UCM) is a software solution that dynamically distributes AI key-value (KV) cache across HBM, DRAM, and SSD tiers—like packing essentials into pockets, backpacks, and suitcases—to optimize speed, cost, and memory efficiency under hardware constraints.

Here’s how it works and why it matters:

What is Huawei UCM?

Huawei’s Unified Cache Manager (UCM) is a hierarchical memory management system designed to accelerate AI inference—especially for large language models (LLMs)—by intelligently allocating KV cache data across:
  • HBM (High-Bandwidth Memory) = “pockets” ? ultra-fast but limited capacity
  • DRAM = “backpacks” ? slower than HBM but more spacious
  • SSD = “suitcases” ? large and cheap, but much slower
This metaphor captures the essence: frequently accessed, latency-sensitive data stays close (HBM), while less critical data is offloaded to slower, cheaper storage (DRAM/SSD).

Why UCM Matters

Huawei developed UCM in response to U.S. export restrictions on HBM and advanced GPUs, which have constrained China’s access to cutting-edge AI hardware. UCM helps mitigate this by:

  • Reducing reliance on scarce HBM: It offloads KV cache to DRAM and SSD when needed.
  • Avoiding recomputation: Evicted KV data from HBM is stored in SSDs and retrieved when needed, rather than recalculated.
  • Boosting performance: Huawei claims up to 22× throughput gains and 90% latency reduction in real-world AI inference workloads.

Real-World Deployment

UCM has already been tested in production environments like China UnionPay, supporting applications such as:

  • Customer voice analysis
  • Marketing planning
  • Office assistance
These trials demonstrated lower operational costs and faster AI response times, even without top-tier hardware.

Technical Highlights

  • Tiered KV cache: Data is moved between HBM, DRAM, and SSD based on access frequency and latency needs.
  • AI SSD integration: Huawei is pairing UCM with custom SSDs to further ease GPU memory bottlenecks.
  • Open-source roadmap: Huawei plans to open-source UCM, encouraging broader adoption and ecosystem growth.

If you're benchmarking wafer-scale disruptors like Multibeam, DensityAI, and Cerebras, UCM is Huawei’s software counterpunch—a memory-layer optimization that sidesteps hardware chokepoints. Want to explore how this compares to VAST Data’s or WEKA’s cache offload strategies
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext