SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Non-Tech : Kirk's Market Thoughts
COHR 134.64+4.6%3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
From: Kirk ©10/14/2025 10:02:45 AM
1 Recommendation

Recommended By
7kidstofeed

   of 26421
 
Making efficient use of power and water is a big deal here in the West where both are expensive and in short supply.
"Microsoft reiterated its long-term goal of becoming carbon negative and water positive"

Microsoft expands AI infrastructure by 2 GW, cuts GPT-4 cost by 93%, and backs new OCP power and cooling standards

Joseph Chen, DIGITIMES Asia, Taipei
Tuesday 14 October 2025 0

Microsoft is accelerating its transformation into what it calls "the world's AI supercomputer," revealing that it added 2 gigawatts of Azure capacity in the past year, more than the company's total capacity three years ago.

Speaking at the OCP Global Summit, Saurabh Dighe, corporate vice president of Azure Strategic Planning and Architecture and a member of the Open Compute Project (OCP) board, said the scale of expansion reflects a fundamental redesign of Microsoft's infrastructure "from the system to the silicon."



Credit: DIGITIMES

Dighe explained that scaling AI responsibly requires moving beyond the "lap one chaos" of early growth to focus on strategy, endurance, and sustainable execution. "Quality, security, and sustainability must now be as important as raw capability," he said. To achieve this, Microsoft had to return to first principles and re-engineer every layer of its stack—from hardware architecture and power systems to firmware and software integration.

Fairwater data centers anchor Microsoft's AI buildout

At the core of Azure's expansion is the Fairwater AI-optimized data center, a 1.2-million-square-foot facility equipped with "hundreds of thousands of G200 GPUs" interconnected through a high-performance AI network fabric. A single Fairwater facility is designed to deliver ten times the performance of a leading supercomputer.



Credit: DIGITIMES

Through end-to-end optimization of the Azure stack, Microsoft has also reduced the cost of running GPT-4 by 93% over the past two years. The company's next-generation AI RAM network serves as a scalable backbone linking these massive clusters. The Fairwire data center uses Nvidia Spectrum-X Ethernet as its network backplane, operating on the Sonic open-source system.

Power and cooling redesigned for megawatt density

AI workloads are driving unprecedented power densities, prompting Microsoft to redesign its data-center engineering around liquid cooling and high-voltage DC distribution. New facilities employ closed-loop cooling systems that recycle water continuously, achieving zero water wastage while sustaining reliability under extreme thermal loads.



Credit: DIGITIMES

Microsoft is deploying its second generation of heat exchanger units (HXU), which doubles the capacity over last year's design. These modular units integrate with existing infrastructure and will soon be contributed to the OCP community.

For power distribution, Microsoft is advancing the OCP-aligned Mount Diablo architecture, which supports ±400-volt and 800-volt differential operation to enable racks exceeding one megawatt of power. The company is also developing solid-state transformers (SSTs) that convert medium-voltage AC directly to high-voltage DC, potentially reducing space requirements by 60% and improving efficiency in grid-interactive systems.



Credit: DIGITIMES

Managing synchronous power spikes

A major challenge in AI data centers is synchronous power spiking, where thousands of GPUs ramp up simultaneously, straining both data-center infrastructure and the electrical grid. Working with OpenAI and Nvidia, Microsoft has introduced predictive telemetry and adaptive firmware to smooth up to 40% of these power spikes. These innovations will be contributed to OCP to support broader adoption across the industry.

Reliability, security, and sustainability at AI scale

Dighe said that in large-scale AI environments, reliability must now be measured at the job level, since a single node failure can disrupt distributed training. Microsoft is standardizing firmware frameworks across CPUs, GPUs, and accelerators to support impactless updates, advanced telemetry, and predictive diagnostics.



Credit: DIGITIMES

Security efforts are centered on the Calipra initiative, part of Microsoft's secure infrastructure roadmap. The new Calipra 2.1 extends the hardware root of trust through the full boot process, while OCP Lock, developed with storage partners, offers transparent key management for self-encrypting drives.

Microsoft reiterated its long-term goal of becoming carbon negative and water positive, highlighting its leadership in OCP's embodied-carbon disclosure framework and new standards for heat reuse in high-density AI facilities.

Article edited by Jack Wu
digitimes.com
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext