SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Strategies & Market Trends : 2026 TeoTwawKi ... 2032 Darkest Interregnum -- Ignore unavailable to you. Want to Upgrade?


To: Julius Wong who wrote (214330)5/14/2025 1:22:59 AM
From: TobagoJack  Read Replies (1) | Respond to of 217542
 
re <<However, the program's high price tag remains a contentious issue for those concerned about budget efficiency and alternative defense solutions.

As the stealth fighter continues evolving and addressing its challenges, its true worth will likely become clearer.
>>

... suspect (i) the all-in costing is fatal, as (ii) the factor limits all-out quantity, (iii) especially given the <50% ready-to-fly feature, and (iv) the <30% read-to-fight bug

if any of (i) - (iv) suspicions valid, and enough suspicions can prove valid out of the many suspicions, then entire program just waiting to be proven as fractal-scale-up of colossal bad-bad idea and to be kashmir-ed (here used as verb)

worse, (1) the continuing-spend on the program likely poisons the airforces of the entire alliance of spendthrifts, and (2) much worse, precludes better-spends on more effective programs across all military / defence domains, especially (3) given programs up and coming from elsewhere

basically altogether a bummer, and one that is near-impossible to fix, just like the Rafale in the incidence of J-10C, even by DOGE-ing

in the meantime all of below citing is gold-bullish, for massive inflation this way comes, and if not, even more massive inflation must- and shall- be










bulgarianmilitary.com

Pakistan’s J-10C stuns India’s Rafale with electronic jamming?

Summarise?

On May 4, 2025
Pakistan’s Defense Minister Khawaja Asif made a startling claim: four Indian Air Force Rafale fighter jets were electronically jammed on the night of April 29-30 by Pakistani forces near the Line of Control [LoC] in the disputed Kashmir region, forcing them to retreat and make emergency landings in Srinagar.


Photo credit: PAF
According to posts on X citing Asif’s statement, the Pakistani Air Force deployed its Chinese-made Chengdu J-10C fighters, backed by advanced electronic warfare systems, to disrupt the Rafales’ radar and communication systems.

The alleged incident, which India has not confirmed, has sparked intense debate about the capabilities of China’s rapidly advancing military technology and its potential to challenge Western-designed systems like the French-built Rafale.

While the claim remains unverified and carries the risk of being propaganda, it raises critical questions about the evolving landscape of aerial warfare and the growing sophistication of electronic countermeasures.

The reported encounter occurred against a backdrop of heightened tensions between the two nuclear-armed neighbors, following a deadly terrorist attack on April 22 in Pahalgam, Kashmir, that killed 26 tourists, mostly Indian nationals.

India accused Pakistan of sponsoring the attack, a charge Islamabad vehemently denied, while both sides engaged in diplomatic and military posturing, including cross-border skirmishes along the LoC. Pakistan’s state-run media, including PTV News, reported that its air force detected and chased Indian Rafale jets conducting reconnaissance near the LoC, forcing them to “retreat in panic.”

Asif’s assertion, echoed by outlets like Clash Report, goes further, claiming the Rafales’ advanced systems were rendered inoperable by Pakistan’s electronic warfare capabilities, a feat that, if true, would mark a significant technological achievement for Pakistan and its Chinese-supplied arsenal.

The Chengdu J-10C, at the heart of Pakistan’s claim, is a single-engine, multirole fighter developed by China’s Chengdu Aerospace Corporation. Introduced to the Pakistani Air Force in March 2022, the J-10C represents a cornerstone of Pakistan’s efforts to modernize its fleet in response to India’s acquisition of 36 Rafale jets from France.

Powered by a Chinese WS-10B turbofan engine, the J-10C can reach speeds of Mach 1.8 and has an operational range of approximately 1,250 miles with external fuel tanks. Its active electronically scanned array [AESA] radar, believed to be a variant of the KLJ-10, provides enhanced target detection and tracking capabilities.

The aircraft is armed with a mix of air-to-air and air-to-ground munitions, including the long-range PL-15 missile, which boasts a range exceeding 120 miles, and the PL-10, a short-range missile with advanced infrared homing.

Pakistan’s acquisition of at least 25 J-10C fighters, announced in December 2021, was explicitly framed as a counter to India’s Rafale program, underscoring the strategic rivalry driving technological advancements in the region.

What sets the J-10C apart in this context is its reported integration of advanced electronic warfare systems, which Asif claims were used to disrupt the Rafale jets. While specific details about the J-10C’s electronic warfare suite are scarce due to China’s secretive approach to military technology, defense analysts suggest it may incorporate systems similar to the KG300G or KG600, Chinese-developed jammers capable of interfering with enemy radar and communication systems.

These systems could employ digital radio frequency memory [DRFM] techniques, which allow them to record and manipulate incoming radar signals, creating false targets or overwhelming an opponent’s sensors. Such capabilities would be critical in countering the Rafale’s sophisticated defenses, particularly its SPECTRA electronic warfare system, which is designed to protect the aircraft from a wide range of threats.

The Dassault Rafale, operated by the Indian Air Force since 2020, is a twin-engine, multirole fighter renowned for its versatility and cutting-edge technology. Powered by two Snecma M88-2 engines, the Rafale achieves a top speed of Mach 1.8 and an operational range of about 2,300 miles.

Its Thales RBE2 AESA radar offers superior situational awareness, enabling it to track multiple targets simultaneously across long distances. The Rafale’s arsenal includes the Meteor beyond-visual-range air-to-air missile, with a range exceeding 90 miles, and the MICA missile for close-range engagements.

India’s Rafale jets, based at Ambala and Hasimara air bases, are equipped with India-specific enhancements, including helmet-mounted displays and the ability to carry the BrahMos supersonic cruise missile, a joint Indo-Russian weapon designed for precision strikes. The aircraft’s integration into India’s network-centric warfare systems further enhances its combat effectiveness, allowing it to share data with other platforms in real-time.

Central to the Rafale’s survivability is its SPECTRA system, developed by Thales and MBDA. SPECTRA, which stands for Système de Protection et d’Évitement des Conduites de Tir du Rafale, is a comprehensive electronic warfare suite that combines active and passive sensors, jammers, and decoy dispensers to detect, analyze, and neutralize threats.

The system uses advanced algorithms to identify incoming radar and missile signals, deploying countermeasures such as chaff, flares, or directed jamming to confuse enemy sensors. SPECTRA’s active cancellation technology, a closely guarded feature, is believed to generate tailored electromagnetic signals that mask the Rafale’s radar signature, making it harder to detect.

This system has been praised for its ability to counter sophisticated threats, including Russian and Chinese radar systems, and is considered one of the most advanced electronic warfare suites on any fighter jet today.

Pakistan’s claim that its J-10C fighters disrupted SPECTRA is extraordinary, as it suggests a level of technological prowess that challenges Rafale’s reputation as a near-invulnerable platform. Electronic warfare involves the use of electromagnetic signals to degrade an opponent’s sensors, communications, or navigation systems, often through jamming or deception.

For Pakistan to have successfully jammed the Rafale, its forces would have needed to overpower or outsmart SPECTRA’s countermeasures, a task requiring precise coordination and advanced equipment. While the J-10C’s onboard jammers could play a role, experts speculate that Pakistan may have employed ground-based electronic warfare systems, such as Chinese-supplied units, to augment its airborne capabilities.

These systems could emit high-powered signals to saturate the Rafale’s sensors, potentially causing temporary disruptions in radar and communication functions.

The plausibility of Pakistan’s claim hinges on several factors. The J-10C, while advanced, is a relatively new platform with limited combat experience compared to the Rafale, which has seen action in conflicts in Libya, Mali, and Syria.

China’s investment in electronic warfare has grown significantly in recent years, driven by its ambition to close the technological gap with Western powers. Systems like the KG600, used on other Chinese aircraft, are designed to jam radar frequencies across a wide spectrum, potentially affecting AESA radars like the Rafale’s RBE2.

However, overcoming SPECTRA’s adaptive jamming and active cancellation features would require a highly sophisticated and targeted approach, possibly involving real-time signal analysis and overwhelming power output.

Without independent verification, such as satellite imagery or intercepted communications, the claim remains speculative, and India’s silence on the matter suggests either a strategic decision to avoid escalation or an acknowledgment of a minor operational setback.

The incident, if true, would not be the first time electronic warfare has played a decisive role in modern aerial engagements. During the 2019 Balakot airstrike, India’s Mirage 2000 jets reportedly used electronic countermeasures to evade Pakistani radar, while Pakistan’s F-16s and JF-17s engaged in a dogfight that resulted in the downing of an Indian MiG-21.

Electronic warfare has become a critical domain in military strategy, with nations like the United States, Russia, and China investing heavily in systems to dominate the electromagnetic spectrum. The U.S. Navy’s EA-18G Growler, for example, is a dedicated electronic warfare aircraft equipped with the ALQ-249 Next Generation Jammer, capable of disrupting enemy radar and communications across vast distances.

Russia’s Su-35 fighters, meanwhile, use the Khibiny electronic warfare pod to create protective jamming bubbles around aircraft. China’s advancements, as potentially demonstrated by the J-10C, suggest it is catching up, leveraging its growing industrial base to produce cost-effective systems that challenge more expensive Western platforms.

Historically, India and Pakistan have used air forces as key instruments of power projection along the LoC. The 1999 Kargil War saw limited but intense aerial engagements, with India’s MiG-29s and Mirage 2000s conducting strikes against Pakistani positions.

Pakistan’s F-16s, acquired from the United States in the 1980s, have long been a mainstay of its air force, but the introduction of the J-10C in 2022 marked a shift toward reliance on Chinese technology. This shift reflects broader geopolitical trends, with Pakistan deepening its military ties with China through initiatives like the Belt and Road Initiative.

The J-10C’s deployment in Pakistan serves as a testing ground for Chinese hardware, providing valuable data on its performance against advanced adversaries like the Rafale. For India, the Rafale acquisition was a strategic move to counter both Pakistan and China, whose J-20 stealth fighters pose a long-term threat along the Line of Actual Control in Ladakh.

The broader context of the April 29-30 incident underscores the volatile dynamics in Kashmir. The Pahalgam attack, claimed by The Resistance Front, a group India links to Pakistan-based Lashkar-e-Taiba, prompted a series of retaliatory measures, including India’s suspension of the Indus Waters Treaty and closure of its airspace to Pakistani flights.

Pakistan responded with reciprocal measures, escalating the diplomatic crisis. Cross-border firing along the LoC, reported on multiple occasions in late April, further heightened tensions, with both sides accusing each other of ceasefire violations.

The alleged Rafale incident, whether factual or exaggerated, fits into a pattern of psychological and informational warfare, where claims of technological superiority serve to bolster domestic morale and deter adversaries.

Skepticism about Asif’s claim is warranted, given the history of unverified assertions in India-Pakistan conflicts. In 2019, Pakistan claimed to have shot down an Indian Su-30MKI during the Balakot skirmish, a claim debunked by India, which confirmed the loss of only a MiG-21.

Similarly, social media posts in late April 2025 falsely claimed a Rafale was shot down near the LoC, using footage from an unrelated Su-30MKI crash in Maharashtra in June 2024. The Press Information Bureau’s Fact Check unit swiftly dismissed these reports, highlighting the prevalence of misinformation in the region.

Asif’s statement, amplified by platforms like X, may serve a similar purpose, projecting strength to domestic and international audiences while pressuring India to de-escalate.

For the United States, the incident holds implications beyond South Asia. China’s growing ability to produce competitive military systems, as evidenced by the J-10C, challenges the dominance of Western defense industries, which supply platforms like the F-35 and Rafale to allies worldwide.

The U.S., which operates over 450 F-35 stealth fighters, relies on advanced electronic warfare capabilities to maintain air superiority. The EA-18G Growler and the F-35’s AN/ASQ-239 Barracuda system are designed to counter threats like those posed by Chinese jammers, but the rapid proliferation of such technology to nations like Pakistan raises concerns about the global balance of power.

Moreover, Pakistan’s use of Chinese systems could influence other nations, particularly in the Middle East and Africa, to consider Chinese arms over Western alternatives, impacting U.S. defense exports.

The lack of independent evidence makes it difficult to assess the veracity of Pakistan’s claim, but its implications are undeniable. If the J-10C did disrupt Rafale’s systems, it would signal a leap forward for Chinese aerospace technology, potentially reshaping perceptions of its reliability and effectiveness.

Even if the claim is exaggerated, it serves as a reminder of the growing importance of electronic warfare in modern conflicts, where control of the electromagnetic spectrum can be as decisive as traditional firepower.

The incident also highlights the challenges of verifying claims in an era of rapid information dissemination, where social media platforms like X can amplify unconfirmed reports, shaping narratives before facts are established.

As tensions in Kashmir persist, the alleged Rafale incident underscores the need for robust electronic warfare defenses across military platforms. For India, it may prompt a reassessment of the Rafale’s operational protocols and maintenance standards, ensuring its systems are resilient against emerging threats.

For Pakistan, the J-10C’s reported success, whether real or fabricated, bolsters its strategic partnership with China, reinforcing its role as a key ally in Beijing’s geopolitical ambitions. For the global defense community, the incident serves as a wake-up call: the gap between Western and Chinese military technology is narrowing, and future conflicts may hinge on the ability to dominate the invisible battlefield of electronic warfare.

Could this be a turning point in the race for aerial supremacy, or is it merely another chapter in the long history of unverified claims between India and Pakistan? Only time, and perhaps declassified records, will tell.



To: Julius Wong who wrote (214330)5/14/2025 5:48:10 AM
From: TobagoJack  Read Replies (1) | Respond to of 217542
 
re <<Cost So Much>> ...

... why does everything cost so much they do not ... after DeepSeeking, or post-J-10C-ing / solid-PL15-ing

bloomberg.com

DeepSeek’s ‘Tech Madman’ Founder Is Threatening US Dominance in AI Race

The company’s sudden emergence illustrates how China’s industry is thriving despite Washington’s efforts to slow it down.


DeepSeek founder Liang Wenfeng meeting with President Xi Jinping in Beijing in February.

Photographer: Florence Lo/Reuters

By Bloomberg Businessweek

May 14, 2025 at 5:00 AM GMT+8

With his wispy frame and reserved style, Liang Wenfeng can come off as shy, nervous even, in meetings. The founder of DeepSeek—the Chinese startup that recently upended the world of artificial intelligence—is prone to faltering speech and prolonged silences. But new hires learn quickly not to mistake his quiet rumination for timidity. Once Liang processes the finer points of a discussion, he fires off precise, hard-to-answer questions about model architecture, computing costs and the other intricacies of DeepSeek’s AI systems.

Employees refer to Liang as lao ban, or “boss,” a common sign of respect for business superiors in China. What’s uncommon is just how much their lâobân empowers young researchers and even interns to take on big experimental projects, habitually stopping by their desks for updates and pushing them to consider unusual engineering paths. The more technical the conversation the better, especially if it leads to real performance gains, milestones Liang has personally shared on their internal Lark messaging channel. “He’s a true nerd,” says one former DeepSeek staffer who, like many people interviewed for this article, requested anonymity because they weren’t authorized to speak publicly about the company. “Sometimes, I felt he understood the research better than his researchers.”

Liang and his young company catapulted to international prominence in January when it released R1, an AI model that had the feel of an explosive breakthrough. R1 beat the dominant Western players on several standardized tests commonly used to assess AI performance, yet DeepSeek claimed to have built its base model for about 5% of the estimated cost of GPT-4, the model undergirding OpenAI’s ChatGPT.

The test results triggered a $1 trillion selloff in US markets and sparked thorny questions about the US strategy to use export controls to slow China’s progress on AI. Amazon and Microsoft raced to add DeepSeek’s models to their cloud offerings, alongside rivals from Meta and Mistral AI. “Basically over a weekend, the interest in DeepSeek just grew so much that we got into action,” says Atul Deo, who oversees Amazon.com Inc.’s language model marketplace.

DeepSeek cleared the fogged window through which Americans have viewed much of China’s AI scene: shrouded in mystery, easier to dismiss as an exaggerated specter but very likely more daunting than they’re willing to admit. Before the startup’s emergence, many US companies and policymakers held the comforting view that China still lagged significantly behind Silicon Valley, giving them time to prepare for eventual parity or to prevent China from ever getting there.

The US Dominates AI Investment ...Private investment in AI

Source: Quid, compiled by Stanford University AI Index

The reality is that Hangzhou, where DeepSeek is based, and other Chinese high-tech centers have been roaring with little AI dragons, as AI startups are often called. Sophisticated chatbots from homegrown startups such as MiniMax and Moonshot AI have rocketed in popularity, including in the US. Alibaba Group Holding Ltd.’s Qwen family of large language models consistently ranks near the top of prominent leaderboards among LLMs from Google and Anthropic; Baidu Inc.’s chief executive officer, Robin Li, boasted in April that the search giant could develop models that were as good as DeepSeek’s but even cheaper, thanks to its new supercomputer, assembled with in-house chips. Huawei Technologies Co. is likewise winning plaudits for the products it’s designed to compete against equipment from Nvidia Corp., whose graphics processing units (GPUs) power the most advanced AI models in the US and Europe.

... But China’s Tech Is Catching Up

Performance measure of top AI models on LMSYS Chatbot Arena

Source: LMSYS, compiled by Stanford University AI Index

Note: Chatbot Arena is an open-source platform for evaluating AI through human preference developed by researchers at LMArena

It wasn’t that long ago that the Chinese Communist Party was clipping the wings of what it saw as an out-of-control tech sector. Antitrust probes and data-compliance reviews were initiated, luminaries such as Alibaba co-founder Jack Ma faded from public view, and new regulations were imposed on social media, gig economy and gaming apps. Now the CCP is lifting up its domestic tech industry in the face of foreign interference. President Xi Jinpingis marshaling resources to AI and semiconductors, emboldening China’s high-skilled workforce and calling for “an independent, controllable and collaborative” software and hardware ecosystem.

Also propelling China’s recent strides, ironically, are geopolitical constraints aimed at slowing its AI momentum. Wei Sun, an analyst at Counterpoint Technology Market Research, says the AI gap between the US and China is now measured in months, not years. “In China there’s a collective ethic and a willingness to work with intensity that results in executional superiority,” says Sun, noting that the forced scarcity of Nvidia chips unearthed novel AI innovations. “This dynamic creates a kind of Darwinian pressure: Survival goes to those who can do more with less.”

Where China sees innovation, many in the US continue to suspect malfeasance. An April report from a bipartisan House of Representatives committee alleged “significant” ties between DeepSeek and the Chinese government, concluding that the company unlawfully stole data from OpenAI and represented a “profound threat” to US national security. Dario Amodei, CEO of Anthropic, has called for more US export controls, contending in a 3,400-word blog post that DeepSeek must have smuggled significant quantities of Nvidia GPUs, including its state-of-the-art H100s. (Bloomberg News recently reported that US officials are probing whether DeepSeek circumvented export restrictions by purchasing prohibited chips through third parties in Singapore.)


Anthropic CEO Dario Amodei has cited DeepSeek as a reason for tougher restrictions on chip exports to China.Photographer: Chesnot/Getty Images

The Chinese Embassy has rejected the House committee’s claims as “groundless.” Nvidia has said that DeepSeek’s chips were export-compliant and that more restrictions could benefit Chinese semiconductors. A spokesperson for the chipmaker says forcing DeepSeek to use more chips and services from China would “boost Huawei and foreign AI infrastructure providers.”

The company at the center of this debate continues to be something of an enigma. DeepSeek prides itself on open-sourcing its AI technology while not being open whatsoever about its inner workings or intentions. It reveals hyperspecific details of its research in public papers but won’t provide basic information about the general costs of building its AI, the current makeup of its GPUs or the origins of its data.
“We don’t know what DeepSeek’s true motivations are. It’s a bit of a black box”
Liang himself has long been known to be so inherently unsociable that some leaders of China’s AI scene privately call him “Tech Madman,” a variation on a nickname reserved for eccentric entrepreneurs with outsize ambitions. He hasn’t granted a single press interview in the past 10 months, and few knew what he looked like until a photograph surfaced of his boyish, bespectacled face during a recent hearing with Chinese Premier Li Qiang. Liang and his colleagues didn’t respond to repeated requests for comment for this article, except for an autoreply from one employee that said the inquiry was being processed: “Thank you for your attention and support for DeepSeek!” her email added.


Liang in January.Source: Zuma Press

To further understand how the company works and how it fits into the country’s broader AI ambitions, Bloomberg Businessweek spoke with 11 former employees of Liang’s, along with more than three dozen analysts, venture capitalists and executives close to China’s AI industry.

The lack of a public presence has allowed critics such as Amodei and OpenAI head Sam Altman to fill the void with aspersions, which resonate with US audiences who are primed to see Chinese technology as a shadowy threat. But even those who remain wary of DeepSeek are being forced to grapple with the undeniable prowess of its AI. Dmitry Shevelenko, the chief business officer of Perplexity AI Inc., says not a single person at his company, which makes an AI-powered search product, has managed to communicate with any counterparts at DeepSeek. Nevertheless, Perplexity has embraced DeepSeek’s tech, hosting it only on servers in the US and Europe and post-training it to remove any datasets indicative of CCP censorship. Perplexity branded it R1 1776 (a reference to the year of the US’s founding), which Shevelenko describes as an homage to freedom. “We don’t know what DeepSeek’s true motivations are,” he says. “It’s a bit of a black box.”

DeepSeek had anticipated its AI might cause concerns abroad. In an overlooked virtual presentation at an Nvidia developer conference in March 2024, Deli Chen, a deep-learning researcher at DeepSeek, spoke of how values ought to be “decoupled” from LLMs and adapted to different societies. On one coldly logical slide, Chen showed a DeepSeek prototype for customizing the ethical standards built into chatbots being used by people of various backgrounds. With a quick tap of a button, developers could set the legality of issues including gambling, euthanasia, sex work, gun ownership, cannabis and surrogacy. “All they need to do is to select options that fit their needs, and then they will be able to enjoy a model service that is tailored specifically to their values,” Chen explained.

Finding such efficient workarounds was always the cultural norm at DeepSeek. Liang and his friends studied various technical fields at Zhejiang University in the mid-2000s—machine learning, signal processing, electronic engineering, etc.—and, apparently for kicks (and, you know, for cash), developed computer programs to trade stocks during the global financial crisis.

After graduating, Liang continued building quant-trading systems on his own, earning a small fortune before joining forces with several of his university friends in Hangzhou, where they launched what became known as High-Flyer Quant in 2015.

Early job postings boasted of luring top talent from Google and Facebook and sought math and coding “geeks” with the “quirky brilliance” of Sheldon, the awkward main character of the sitcom The Big Bang Theory. They promised free snacks, Herman Miller chairs, poker nights, an office culture that smiled upon T-shirts and slippers and, with a dollop of fintech bro culture, the opportunity to work with “adorable, soft-spoken girls born in the 1990s” and “a sharp goddess who returned from Wall Street.”


DeepSeek’s Beijing office.Photographer: Peter Catterall/AFP/Getty Images

As would be the case with DeepSeek, High-Flyer cultivated a sense of mystery—its first social media post referred to Liang only as “Mr. L”—while committing itself to a kind of lemme-prove-it transparency. Every Friday, High-Flyer would post charts of the performance of its 10 original funds on the Chinese super-app WeChat. Before making the weekly data available only to registered investors in the summer of 2016, the portfolio was seeing average annualized returns of 35%.

Billions of dollars eventually flowed into High-Flyer’s holdings, and its investment and research group increased to more than 100 employees. Liang started recruiting in earnest for an AI division in 2019, aiming to mine gargantuan datasets to spot undervalued stocks, tiny price fluctuations for high-frequency trading and macro trends that industry-specific investors were missing. By the beginning of the Covid-19 pandemic, he and his team had constructed a high-performance computing system of interconnected processors running in tandem, a setup known as a cluster. For this cluster, High-Flyer said it had acquired 1,000 Nvidia 2080Ti chips—commonly used by gamers and 3D artists—and an additional 100 Volta-series GPUs. (The Volta GPU, aka the V100, was Nvidia’s first AI-optimized processor.) Whereas High-Flyer’s previous, smaller computing architecture required two months to train a new economic analysis model, its new equipment needed less than four days to process the same workload.

These finance models were impressive but much smaller than the generalist models US operations like OpenAI were building. Liang pushed for the construction of a substantially bigger supercomputer consisting of Nvidia’s then-new A100 GPUs, its upgraded successor to the V100. A former High-Flyer engineer involved with the project says Liang was “the single biggest user” of the growing cluster, estimating 80% of the computer processing used to develop models was assigned to his username. This ex-engineer says Liang seemed obsessed with deep learning, calling it “his expensive hobby.” Plowing hundreds of millions of dollars into such AI infrastructure was probably overkill for a quant firm, but Liang had generated more than enough profits to afford it. “Small money for Liang at the time,” the engineer recalls. “More computing power, better models, more gains in trading.”

At least that was the hope. High-Flyer, which was then managing roughly $14.1 billion in assets, apologized in a December 2021 letterto stakeholders for a streak of disappointing returns. The firm blamed the downturn on its AI systems, which it said had made smart stock picks but failed to proficiently time exits from those trades amid the volatility of the pandemic. Even so, it decided to literally double down on AI: In January 2022, High-Flyer posted on social media that it had amassed 5,000 Nvidia A100s, each of which usually costs tens of thousands of dollars. In March it announced this cluster had expanded to 10,000, a mere six months before Nvidia warned new US restrictions could affect exports of such chips to China.

It’s unclear how much of this infrastructure was ultimately intended for quant trading versus Liang’s expensive hobby. The next spring, about five months after OpenAI introduced ChatGPT, he spun out DeepSeek as an independent research lab. At separate offices in Hangzhou and Beijing, finance was no longer the focus. In an unsigned manifesto rife with platitudes, High-Flyer vowed to shun mediocrity and tackle the hardest challenges of the AI revolution. Its ultimate goal: artificial general intelligence.


Featured in the June 2025 issue of Bloomberg Businessweek. Subscribe now.Illustration: 731

Throughout 2023 the DeepSeek lab raced to build an AI code assistant, a general-knowledge chatbot and a text-to-3D-art generator. Liang brought over engineers from High-Flyer and recruited more from Microsoft Corp.’s Beijing office and leading Chinese tech companies and universities. Bo “Benjamin” Liu, who joined as a student researcher that September prior to starting a Ph.D., says Liang frequently gave interns crucial jobs that elsewhere would be assigned to senior employees. “Take me as an example: When I got to the company, no one was working on the RLHF infra”—the infrastructure needed to support an important technique known as reinforcement learning from human feedback—“so he just let me do it,” Liu says. “He will trust you to do the things no one has done before.” (That trust came with a secondary benefit to DeepSeek: It paid interns the equivalent of $140 per day with a $420 monthly housing subsidy, generous compensation in China but about a third of what interns make at AI companies in the US, and a tiny fraction of what full-time Silicon Valley engineers earn.)

Liang placed a huge and early bet on sparsity, a technique for training and running LLMs more efficiently by breaking them down into specialties, according to two ex-DeepSeek researchers. When you asked the original ChatGPT a question, its entire LLM brain would activate to determine the ideal answer, whether you asked for the sum of 2 + 2 or a pie recipe. A sparse model, by contrast, would make better use of resources by being partitioned into “experts,” with only the relevant ones being activated in response to any particular prompt.

A sparse approach can lead to enormous savings on computing costs, but it gets extremely complex. If a question isn’t processed by enough circuits of the brain or is sent to the wrong lobes, answer quality will degrade. (The math brain would know how to use pi in a formula, but not what goes into that pie recipe, for instance.) Liang saw progress in this area from Google and French unicorn Mistral, which had released a sparse model in December 2023 that was divided into eight experts, with each query activating two of the most relevant ones based on context. He rallied his team to design models with ever-more experts, a technique that comes with the potential of increasing hallucinations and fragmenting the AI’s knowledge. “This sparked significant internal debate,” says the former DeepSeek staffer.

More breakthroughs followed, each shared publicly and increasingly catching the attention of Chinese competitors. Then, in late 2024, DeepSeek released V3, a general-purpose AI model that was about 65% larger than Meta Platforms Inc.’s equivalent, which was then the biggest open-source LLM available. But it was a lengthy V3 research paper that really grabbed the attention of executives at Google, OpenAI and Microsoft, about a month before DeepSeek broke into the wider consciousness with its R1 reasoning model. One shocking statistic that leapt off the PDF: DeepSeek implied that V3’s overall development had cost a mere $5.6 million. It’s likely this sum referred only to the final training run—a data-refinement process that transforms a model’s previous prototypes into a complete product—but many people perceived it as an insanely low budget for the entire project. By comparison, cumulative training for the most advanced frontier models can run $100 million or more. Anthropic’s Amodei even predicted (before the rise of DeepSeek) that next-generation models will each cost anywhere from $10 billion to $100 billion to train.

Leandro von Werra, head of research for popular AI platform Hugging Face Inc., which hosts rankings of LLMs, says DeepSeek’s “architectural innovation” wasn’t the most striking thing about its model. The biggest revelation he took from its research paper was that the company must have developed high-quality data—either cleverly cleaned up from the web or extracted through other means—to bring V3 to life. “Without very strong datasets, the models will lack performance,” says von Werra. “From the report it becomes very clear that DeepSeek has one of the best training datasets for LLMs out there. Unfortunately the report covers the dataset in half a page out of 50 pages.”

DeepSeek exhibited its rapid progress because Liang saw the open-source ethos as integral to his philosophy. He believed that hiding proprietary techniques and charging for powerful models—the approach taken by top US labs including OpenAI and Google—prioritized short-term advantage over more durable success. Making his models entirely accessible to the public, and largely free, was the most efficient way for DeepSeek to accelerate adoption and get startups and researchers building on its tech. The hope was that this would create a flywheel of product consumption and feedback. As DeepSeek wrote in the announcement of its first publicized LLM almost two years ago, quoting the inventor of open-source operating system Linux: “Talk is cheap, show me the code.”
“Basically they don’t need the money. With all the hype around the Six Little Dragons, people are throwing money at them”
On a cloudy Sunday in April at Hangzhou’s bustling Xiaoshan International Airport, digital billboards touting AI services from Alibaba, ByteDance and Huawei greet arrivals. A humanoid robot with blue hair welcomes passengers with a wave inside the modern terminal. Outside, an autonomous-vehicle startup has been testing small self-driving trucks for transporting cargo around the tarmac. For all the noise around DeepSeek, Westerners seem to forget it’s just one of many AI dragons rising across China’s numerous Silicon Valley equivalents. In Hangzhou alone, a megacity with a population of 12.5 million, DeepSeek belongs to an elite group of tech startups known as the Six Little Dragons.

In the scenic West Lake district there’s Game Science, the red-hot studio behind Black Myth: Wukong, a bestselling action game heralded for using machine learning techniques to make its computer characters more lifelike. Not far away are two robotics powerhouses and a unicorn focused on 3D-spatial software. Also nearby is Zhejiang Qiangnao Technology Co., which is known as BrainCo and best understood as a China-backed version of Neuralink Corp. It can be traced back to a startup incubated at Harvard University by a Chinese-born Ph.D. student, Bicheng Han, and is now developing bionic limbs and technologies for brain activity to control computers, at its affiliate lab in Hangzhou. One of BrainCo’s AI-powered prosthetic hands is currently on display at an exhibition center in China Artificial Intelligence Town, another emerging tech hub in Hangzhou.

In recent weeks, BrainCo leaders have given tours at the exhibit, according to someone who attended a session. The attendees often want to invest, but apparently these brainiacs haven’t sounded too desperate for outside capital. “Basically they don’t need the money,” says a fund manager who took the tour. “With all the hype around the Six Little Dragons, people are throwing money at them.”

Standing quietly behind all these startups is the government of President Xi. Generative AI, robotics and other high-tech ambitions are driving a state agenda that above all else seeks domestic “self-reliance and self-strengthening,” as Xi phrased it during a recent Politburo meeting, according to China’s official Xinhua News Agency. “We must recognize the gaps and redouble our efforts to comprehensively advance technological innovation, industrial development and AI-empowered applications.”

The dragons are listening, and not all of them are so little. The main campus of $300 billion conglomerate Alibaba, a sprawling property with its own lake, is in an area of Hangzhou about 40 minutes west of West Lake by car. The company recently pledged $53 billion toward constructing more AI data centers in the next three years, and it’s said its latest Qwen3 flagship models rival DeepSeek’s performance and cost efficiencies. Outside China, Alibaba is usually thought of as an e-commerce business, but its faster-expanding AI and cloud unit was spun off in 2022 to a separate hub on the outskirts of Hangzhou. Inside its conference rooms, big screens glow with an “industry insights flash,” updated every 72 hours, detailing the latest achievements of rivals such as DeepSeek and OpenAI. There’s even a weekly updated version in the restrooms, a reminder that AI races on even when nature calls for human technologists.

This April, Ma, the elusive Alibaba co-founder who practically disappeared during the CCP’s crackdown on China’s tech sector almost five years ago, reappeared on the company’s campus to celebrate the 15th anniversary of its cloud division. In a rare speech, Ma said he wants AI to serve humans, not lord over them, according to several people who saw it. Attendees, who also tuned in to the livestream from offices in Hong Kong and Tokyo, say they were pumped about Ma’s triumphant comeback.

It was a reminder that tech rock stars such as Ma are apparently back in the good graces of the CCP—and being joined by up-and-comers like Liang—even as the shine wears off tech leaders in the US. There’s a swelling national pride in China, which is eager to show it can overcome Western obstacles. George Chen, the Hong Kong-based managing director of policy consultant Asia Group LLC, says top Chinese engineers have begun returning home after stints in the US at Apple, Google, Microsoft and other leading companies. While hostility from the Trump administration is part of that, they’re also being pulled by the feeling that the real action may be shifting east. “Silicon Valley is no longer an attractive place for work for Chinese talent,” Chen says.

Kai-Fu Lee, the founder of another Chinese unicorn, 01.AI, goes a step further. A veteran of Apple, Google and Microsoft himself, Lee says the next generation of talent isn’t following his path through US companies before building their own in China. “These young AI engineers are largely homegrown,” he says. “DeepSeek’s success, along with the success of other new AI startups, is motivating more young talent to be a part of China’s AI renaissance.”


Liang (center) at a symposium in Beijing in February.Photographer: Florence Lo/Reuters

No tech company in China today conjures as much pride as DeepSeek. While visiting Hangzhou with his family in April, Kirby Fung, a 27-year-old computer scientist from Canada, took his family for a tour of Liang’s alma mater, Zhejiang University. Fung had done an exchange program there and wanted to show his grandparents and younger brother that he studied at the same place as Liang. “It’s really cool to explain to my friends back in Canada that the guy who made DeepSeek went to my school,” Fung says.

Tourists and social media influencers also regularly descend on DeepSeek’s headquarters, based in a four-tower complex overlooking China’s famous Grand Canal. The tourists look for signs of Liang at the local shops, including an upscale hot-pot spot in the DeepSeek building where staffers sometimes eat. (The hostess has to break the news that he never stops by.)

People who know Liang say he splits his time between Hangzhou and DeepSeek’s Beijing office, on the fifth floor of a glass tower in a local tech hub. There, twentysomething coders grind at height-adjustable desks, and the pantry is stocked with energy drinks, Kang Shi Fu instant noodles and latiao sticks. There’s a whiteboard where employees can scribble requests for additional food. “I got a bit fat after having lunch and dinner there for months,” says one recently departed researcher.

Liang rarely agrees to meetings with outsiders, sometimes even appearing as a hologram projection for the few he accepts. He spurned an invitation to this year’s influential Paris AI Action Summit, an event that attracted OpenAI’s Altman, Alphabet Inc. and Google CEO Sundar Pichai and a slew of prime ministers and presidents.

While China celebrates DeepSeek, the US treats it like an unfamiliar organism that’s mysteriously shown up in the water supply, examining it for signs of whether it’s benign or malignant. Critics have accused DeepSeek of being controlled by the CCP, ripping off training data from US rivals and contributing to some larger espionage campaign or psyop to undermine Silicon Valley’s AI hegemony. “DeepSeek is a direct pipeline from America’s tech sector into the Chinese Communist Party’s surveillance state, threatening not only the privacy of American citizens but also our national security,” says a spokesperson for the US House committee investigating DeepSeek.

DeepSeek, however, has presented itself as no different than any hot startup—the product of “pure garage-energy,” it said in a February post on X. After all, it operates on the same Beijing campus as Google, not far from a Burger King and two Tim Hortons. Just because the broader AI industry didn’t pay much attention to DeepSeek until now doesn’t mean something shady is happening behind the scenes. “The AI world didn’t expect DeepSeek,” says Arnaud Barthelemy, a partner at VC firm Alpha Intelligence Capital, which has invested in OpenAI and SenseTime. “They should have.”

Barthelemy says the real lesson to take from DeepSeek is how effectively Chinese tech companies are turning the constraints they operate under into a strength. “There are plenty of smart minds in China who did a lot of smart innovation with much lower compute requirements,” he says.

Indeed, in May 2023, coincidentally the same month DeepSeek was established, Nvidia CEO Jensen Huang told Businessweek that the US’s overregulating China will only incentivize it to out-innovate those getting in its way. Describing economic influence as an effective tool of national security, he stressed that the unintended consequences of government interventions would be severe. “To be deprived of one-third of the technology industry’s market has got to be catastrophic,” he said, referring to the risks of limiting US tech exports to China. “They are going to flourish without competition. They will flourish, and they will export it to Europe, to Southeast Asia.”

“You have to be mindful of how far you push competition,” Huang continued. “All of a sudden the response is very unpredictable. People who have nothing to lose respond in ways that are quite surprising.”


Nvidia’s Jensen Huang has argued that export controls could end up strengthening China.Photographer: David Paul Morris/Bloomberg

There’s still controversy about one important part of DeepSeek’s story: how much it actually spent to build its models. In a widely cited report, US research firm SemiAnalysis estimated that High-Flyer and DeepSeek likely had access to clusters of around 50,000 of Nvidia’s top-of-the-line H-series GPUs, worth $1.4 billion, which they’ve mostly kept hidden from the public. The bulk of this infrastructure, SemiAnalysis said, included GPUs that were conceivably export-compliant. (The US allowed Nvidia to sell some chips—the H20 and H800—to China that it modified to limit performance so they adhered to White House restrictions.) But the consulting firm also claimed DeepSeek had access to an additional 10,000 of Nvidia’s bleeding-edge H100 chips, which the US government had banned for sale to China.

Three ex-employees vehemently deny these claims, saying DeepSeek had fewer than 20,000 GPUs consisting of older Nvidia chips and export-controlled ones. “They are spreading lies,” Bo Liu, the Ph.D. candidate, says of SemiAnalysis. The research firm says it stands by its report.

What’s not in question is whether DeepSeek would welcome access to the scale of computing power that US tech companies have. The company seems confident it could do much more with it than Silicon Valley. “The reality is that LLM researchers have an enormous appetite for computational resources—if I were working with tens of thousands of H-series GPUs, I’d probably become wasteful too, running many experiments that might not be strictly necessary,” says one of the former DeepSeek employees. But access to more resources is a problem that China’s technologists would be willing to deal with. “I wish we Chinese companies could have 50,000 GPUs one day,” says the departed researcher, who’s since joined another open-source AI lab in Beijing. “Want to see what we could achieve?” — Austin Carr, Saritha Rai and Zheping Huang, with Luz Ding, Claire Che, Matt Day and Jackie Davalos



To: Julius Wong who wrote (214330)5/14/2025 5:57:28 AM
From: TobagoJack  Respond to of 217542
 
re <<Cost So Much>> ... until something else happens

... but in any case love the reach by reaching ...

bloomberg.com

US Warns That Using Huawei AI Chip ‘Anywhere’ Breaks Its Rules


A Huawei Technologies Co. chip.

Photographer: Qilai Shen/Bloomberg

By Lynn Doan

May 14, 2025 at 3:17 AM GMT+8


  • The Commerce Department has issued guidance stating that the use of Huawei's Ascend AI chips anywhere in the world violates US export controls, escalating efforts to curb China's technological advances.

    Summary by Bloomberg AI

  • The guidance aims to strengthen export controls on AI chip hardware and warns about the consequences of allowing US AI chips to be used for training and inference of Chinese AI models.

    Summary by Bloomberg AI

  • The Commerce Department is also rescinding Biden administration-era regulations on the export of semiconductors used in developing AI, and is drafting a new approach that may involve negotiating individual deals with countries.

    Summary by Bloomberg AI

  • Watch

The Commerce Department issued guidance stating that the use of Huawei Technologies Co.’s Ascend artificial intelligence chips “anywhere in the world” violates the government’s export controls, escalating US efforts to curb technological advances in China.

The agency’s Bureau of Industry and Security said in a statement Tuesday that it’s also planning to warn the public about “the potential consequences of allowing US AI chips to be used for training and inference of Chinese AI models.” The training of AI models involves bombarding them with data to teach them to recognize patterns. Inference, meanwhile, is the stage where models use that training to carry out tasks.

Commerce’s guidance stands to make it all the more difficult for Shenzhen-based Huawei to fulfill its ambitions of developing more powerful chips for AI and smartphones, efforts that have already hit major snags because of US sanctions.

Huawei was designing its next two Ascend processors — China’s answer to Nvidia Corp.’s dominant accelerators — around the same 7-nanometer architecture that’s been mainstream for years, Bloomberg reported in November. US-led restrictions had already kept Huawei’s chipmaking partners from procuring state-of-the-art systems.

The bureau laid out the new instructions while more broadly announcing the rescission of Biden administration-era regulations on the export of semiconductors used in developing AI. Those rules had drawn strenuous objections from US allies and companies, including Nvidia and Oracle Corp.

Biden’s regulations “would have undermined US diplomatic relations with dozens of countries by downgrading them to second-tier status,” the Commerce Department said in the statement Tuesday, adding that it will publish a notice that formalizes the rescission of the rule and issue a replacement “in the future.”

The Trump administration is drafting its own approach and could shift toward negotiating individual deals with countries, according to people familiar with the matter. The Commerce Department said in its statement that whatever comes of it will be “a bold, inclusive strategy to American AI technology with trusted foreign countries around the world, while keeping the technology out of the hands of our adversaries.”



To: Julius Wong who wrote (214330)5/14/2025 6:03:17 AM
From: TobagoJack  Respond to of 217542
 
re <<Cost So Much>> ... and then J-10C followed DeepSeek, even as the former proceeded the latter by decade+
bloomberg.com

DeepSeek Punctured the Myth That Silicon Valley Could Control AI

In just two years, China’s open-source artificial intelligence movement allowed it to virtually close the gap with its US peers.


Illustration: Vivek Thakker for Bloomberg

By Parmy Olson

May 9, 2025 at 6:00 PM GMT+8

In Supremacy: AI, ChatGPT and the Race That Will Change the World (Macmillan, 2024) Bloomberg Opinion columnist Parmy Olson described a tale of ambition, innovation and competition in Silicon Valley that fueled the growth of AI. In this Next Chapter, Olson explains how a new entrant from China turned the sector on its head.

It seemed to come out of nowhere. On Jan. 20 this year, DeepSeek, a Chinese artificial intelligence company spun out of a hedge fund, released R1, an AI tool that was as good as the latest version of ChatGPT and built for a fraction of the cost. It was free to use, its blueprints posted on the internet for anyone to copy. In the cloistered world of Silicon Valley, DeepSeek landed like an asteroid. Marc Andreessen called it “AI’s Sputnik moment” while TechCrunch heralded “the first Silicon Valley freak-out of 2025.”

US tech companies had raised billions to build expensive AI models. Suddenly DeepSeek made that look like wasted money. According to some estimates, R1 was built for the price of a single Google executive’s salary. A week after its release, the Chinese model had topped the app charts for iPhones and Android phones.

DeepSeek dropped a few months after my book Supremacy was published. Back then — way back in September 2024 — AI was largely being shaped by two men, OpenAI co-founder Sam Altmanand Google DeepMind co-founder Demis Hassabis. They had set out to build artificial general intelligence (AGI) — or AI that’s smarter than humans — to boost prosperity, cure cancer and solve climate change. Instead, they were helping Microsoft Corp. and Alphabet Inc. expand their wealth. Two years after the release of ChatGPT, the benefit of generative AI for humanity was still unclear, but the collective market capitalizations of the world’s six biggest tech companies had risen by $8 trillion.

Hassabis and Altman had each grappled with how to police AI. OpenAI initially operated as a nonprofit, and DeepMind’s owner sought to run it the same way. Their aim was to keep large tech companies from controlling their groundbreaking technology. But the efforts to remain independent ultimately failed — and kicked off the AI arms race I chronicled in the book, with Google and Microsoft battling alongside Amazon.com Inc., Meta Platforms Inc. and Elon Musk’s xAI. What was so stunning about DeepSeek’s arrival was how AI was suddenly out of the hands of this tiny group of billionaires, because China was giving it away for free.

Chinese Threat

For those paying attention, it was clear months before DeepSeek hit the scene that Chinese models could pose a threat to the US tech industry. China’s technologists were rapidly shifting toward open-source AI, with startups and large firms alike embracing free and collaborative approaches. In the US, only Mark Zuckerberg had sought to make Meta’s Llama model “open source,” meaning its code and weights were freely available for anyone to use or modify. (Such models are referred to as “open-weight” because they don’t share all necessary details, like the training code, to be recreated from scratch like true “open-source” software. But many in tech and business still use “open source” for AI, to distinguish them from closed models like OpenAI’s, and I am using the term in this story.)

In China, the goal was to accelerate innovation and reduce dependence on Western technology and restricted chips. It largely worked. Supported by government policy — with agencies like the Ministry of Industry and Information Technology encouraging tech companies to make their AI shareable — and a vibrant developer community, China’s open-source AI movement allowed it to virtually close the gap with its American peers over two years.

In March 2025, DeepSeek was the third-most popular open-source AI model on the internet. According to HuggingFace, a website for AI developers to share and find open-source models, it was downloaded 24.3 million times that month. The second-most popular open-source model was from Alibaba Group’s Qwen, which had 60 million downloads in March, putting it just behind Meta’s Llama model, which had 66 million downloads. Known for its so-called multimodal abilities, meaning it can generate text, images and more, Qwen’s open framework has made it a popular foundation for other AI companies to build on in China.


In March, DeepSeek was the third-most popular open-source AI model on the internet.Photographer: Andrey Rudakov/Bloomberg

Market dynamics have also played a role. Baidu Inc.’s Ernie started in 2019 as a proprietary large language model, and Chief Executive Officer Robin Li initially argued to keep it and other AI models closed source. But seeing the success of peers like DeepSeek, Li changed his tune. “If you open things up, a lot of people will be curious enough to try it,” he told a Dubai conference in February. “This will help spread the technology much faster.” Now Baidu is planning to open-source the Ernie 4.5 model series starting in June.

Many of those using the Chinese models have switched over from their Western rivals. “We were using Llama, and we switched to a different model from China,” says Amr Awadallah, the CEO of Palo Alto, California-based startup Vectara, who also co-founded an open-source firm called Cloudera Inc. “The best model from China right now is Qwen. And the second-best one is Ernie. And they’re both ahead of Llama in terms of capability and speed.”

Awadallah initially used OpenAI’s models for his business but “it hurt our margins,” he says. OpenAI’s models might have worked right out of the box, but they were expensive. The open models China offers are not only cheaper to use, they’re transparent, allowing his engineers to inspect their inner workings.

Cheaper and More TransparentHow did DeepSeek go unnoticed? Partly because the global spotlight has been on Silicon Valley, which for years underestimated China’s ambitions in areas like cloud computing and social media. And partly because tech billionaires themselves underestimated the power of efficient technology, like the kind being worked on in China. While battling one another to launch the biggest AI model or the biggest cluster of AI chips, they forgot that cheaper tech can be a critical catalyst. When British tech firm Arm Holdings Plc figured out how to design ultra-efficient processors in the mid-1990s, its chip blueprints found their way into nearly all the world’s smartphones. They remain there today.

It’s now possible that that several Chinese companies could become the engines powering the personal assistants, digital companions and customer service platforms for a growing number of Western consumers and businesses. TikTok, the social media sensation from Beijing-based ByteDance Ltd., may have just been a prelude to another form of Chinese technological influence on the West.

Proprietary, American-made AI models like Google’s Gemini 2.5 Proare widely considered among the best on the market, but DeepSeek has shown their domination is not guaranteed. Today’s backtrack from globalization makes it look increasingly unlikely that a single American platform like Llama will dominate the world, as Google’s Android did with mobile operating systems. And antitrust enforcement in the US also makes it harder for the likes of Google and Meta to act as consolidators.

Healthier Market

The battle for AI supremacy now looks less like a foregone conclusion for Silicon Valley incumbents, and more like the start of a broader, healthier market. The next chapter in AI’s evolution is being shaped in Shenzhen and Beijing as much as it is in San Francisco and Seattle. For consumers and businesses around the world, more international competition promises affordable AI, greater transparency and innovation that doesn’t only address the priorities of the Silicon Valley elite.

There will likely be profound geopolitical consequences, too. If Chinese open-source models power more of the world’s services, that creates a potentially disturbing path of influence by the Chinese Communist Party, and could lead to more protectionist policies from Western governments. It could also herald a more vibrant international market that belies the trade tensions now hampering most other industries. Because China’s AI systems are free, they are, oddly enough, insulated from the current trade war. There are no cross-border prices that high tariffs can affect. This means that even as China’s physical exports get gummed up in trade tensions, its AI can spread far and wide to become one of the country’s most successful (and influential) exports.

With any luck DeepSeek won’t simply herald a shift in concentrated power from the US towards China; it will spark the beginning of a global AI market, where innovation flows freely across borders and its benefits reach beyond the hubs of any single nation.



To: Julius Wong who wrote (214330)5/14/2025 8:28:11 AM
From: TobagoJack2 Recommendations

Recommended By
Julius Wong
marcher

  Read Replies (1) | Respond to of 217542
 
re <<So Much>> just going down the list of tasks, and am here, just downloaded Grok, and earlier downloaded Qwen and DeepSeek, and no, no idea as yet of what I am doing


hatchworks.com
HOW TO DEPLOY AN LLM: MORE CONTROL, BETTER OUTPUTS

  • Updated: August 9, 2024


This article breaks down the exact process I’ve used to deploy an LLM. It’s given me tighter control over company data and outputs free from hallucination which are tailored perfectly to my needs.

Feel free to follow my process and adapt it to yours. It’s simple to follow but if you do get a little lost in the weeds, don’t hesitate to reach out.



??Hi, I’m David Berrio, Senior ML/AI engineer here at HatchWorks AI. I’m obsessed with all things AI and language models, spending much of my time learning how to leverage them to improve client software projects, our own processes at Hatchworks AI, and my personal hobbies.

Short on time? Get the highlights with these four key takeaways:

  1. Local LLM deployment offers better data control, regulatory compliance, and customization options.
  2. The process involves 7 steps, including setting up the environment, building a knowledge base, and using a vector database.
  3. Your vector database will be set up to take the user query and source reliable information from your knowledge database. This is how you get responses free from hallucination while also getting the best from the LLM itself.
  4. HatchWorks AI can teach your team how to deploy LLMs locally for any use case or can do it for you from start to finish. All you need to do is get in touch to learn more.

Watch the full webinar:


This article is based on my 1 hour long webinar on the same topic where I show step-by-step how I’ve built a SageMaker Wizard that can answer any question a developer may have about using Amazon SageMaker.

??? Watch it here.

Why Bother Deploying an LLM?At a glance:

Better data control

Easier regulatory compliance

Faster responses

Optimization of resources

Customized to your needs

Easier experimentation

Cost effective

If the idea of deploying a Large Language Model feels daunting compared to using a ready-made cloud-based system like ChatGPT-4 then this section is for you.

First, let me say that it’s not as daunting of a process as you think. The 7 steps I outline a little later will make that clear.

Second, any effort you do put toward localizing an LLM will be well spent.

And that’s because deploying an LLM locally:

  • Gives you complete control over your data (something your Legal or InfoSec team will thank you for).
  • Makes it easier for you to be compliant with stringent regulatory requirements for data protection such as GDPR or HIPA.
  • Optimizes the usage of your local hardware resources, ensuring the model runs as efficiently as possible.
  • Creates a custom solution where you can fine-tune the model with your data, adjust inference settings, and integrate with bespoke systems.
  • Provides a sandbox environment for experimentation without limitations or restrictionsthat might be imposed by third-party cloud providers.
  • Saves you money for high-usage scenarios where continuous API calls to a cloud service would be expensive.
I really want to stress that last point. If you plan on using an LLM for a high frequency task, it can quickly add up to a substantial monthly expense, especially if you’re dealing with tens or hundreds of thousands of interactions.

For example, ChatGPT-4 is about $10 per million token input and $30 per million token output.

By deploying your own LLM, you make a one-time investment in hardware and software, which can be far more economical in the long run compared to the ongoing costs of cloud services.

This means more predictable budgeting and significant savings over time.

How to Deploy an LLM Locally Using RAG – 7 Simple StepsIn the HatchWorks AI Lab I ran on this same topic, I explain how to deploy an LLM on your machine using Retrieval Augmented Generation (RAG).

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that improves the responses of AI language models by combining them with a search system.

This search system looks through a large collection of documents to find relevant information. By using this extra information, the AI can give more accurate and up-to-date answers, instead of relying only on what it already knows.

I’m going to do the same thing here using a use case where we need a chatbot that answers queries about all things Amazon SageMaker.

Amazon SageMaker is a cloud-based machine-learning platform that allows the creation, training, and deployment by developers of machine-learning models on the cloud.

Your use case will surely be different but the RAG process you follow will be the same. And that process will need:

  • A user interface
  • A framework
  • An inference engine
  • An LLM
  • A knowledge base
  • An embedding model
  • A vector database

You can see the relationship of those below:



The steps you follow will support the implementation of this process.

Step 1: Install All Necessary Software and Set Up Your EnvironmentBefore you begin, you need to create an environment where you can deploy an LLM.

This environment must support various components that are essential for the model to function effectively.

Here’s what you’ll need:

  • GPU: A powerful consumer-grade GPU (e.g., RTX 4090) is recommended to handle the intensive computations involved in running large models. Ensure you have the necessary drivers and CUDA installed. You don’t need a GPU but response generation might be slower without it. An alternative could be to use Apple Silicon-like M series processors. They’ll do a great job with llama.cpp inference engine providing response speed similar to some of the consumer-grade GPUs.
  • Memory and Storage: Adequate RAM or VRAM if using GPU (32GB recommended) and fast storage (preferably SSDs) are necessary to handle large datasets and models.
  • Integrated Development Environment (IDE): Tools like Visual Studio or PyCharm are recommended for writing and managing your code efficiently. These IDEs provide robust support for Python, debugging tools, and integration with other essential tools.
  • Version Control: Using Git for version control helps manage changes to your code and collaborate with others.
  • Python: The primary language for deploying LLMs, due to its extensive library support for machine learning and AI. Ensure you have Python 3.8 or higher.
  • Framework: We use the LangChain framework to streamline the integration and management of our LLM.
  • Essential Libraries: Install libraries such as torch for PyTorch (deep learning framework), transformers from Hugging Face (for handling LLMs), and faiss-cpu (for efficient similarity search and clustering of dense vectors).
  • Vector Database: A vector database like Epsilla is crucial for managing and querying embeddings, which are numerical representations of your data that the LLM can understand and use efficiently.
  • Pre-trained Models: Select a pre-trained model that suits your needs. In our example, we use Mistral.
  • Inference Engine: Tools like Llama CPP facilitate efficient model inference, especially if you’re leveraging GPU acceleration.
  • Chatbot Interface: If you’re deploying a chatbot, you’ll need an interface framework like Streamlit, Rasa, or Botpress that can handle user interactions and integrate with your LLM.

For our environment, we used Visual Studio, Python, Epsilla, Gemma, Mistral, and Llama CPP.

Step 2: Build Your Knowledge BaseYour knowledge base is a set of documentation that your LLM will draw from when coming up with answers to queries and finding solutions.

For our example, we’re using documents related to Amazon SageMaker.

Key Points for Building Your Knowledge Base:
  • Ensure that the documentation you gather is directly related to the use case you’ll be addressing with the LLM. This relevance is crucial for the model to provide accurate and useful responses.
  • Your knowledge base can include a variety of document types such as manuals, research papers, user guides, FAQs, and even images with embedded text.

We then upload these documents into Visual Studio, our development environment where we can easily write code and give instructions that enable the deployment of our LLM.

?? You may also like: LLM Use Cases: One Large Language Model vs Multiple Models

Step 3: Embed Documents into Your Vector DatabaseThis process involves breaking up your documents into manageable chunks, converting them into embeddings, and storing these embeddings in a vector database for efficient retrieval.

Embedding transforms your textual data into numerical vectors that capture the semantic meaning of the text.

These vectors allow the LLM to efficiently search and retrieve relevant information based on the input query.

In our example, we broke up the documents into chunks of 1000 characters which you can see in the code below:



Then we provide instructions, which are highlighted above. Those lines store the embeddings in the Epsilla Vector D and provide instructions on what to do with those broken up chunks of text.

Step 4: Embed the User-Interface Question into the Vector DatabaseFor your LLM to accurately respond to questions, you need to process the user’s query into the vector database.

This step ensures that the LLM can pull together the correct contextual information from the embedded documents to generate an accurate and relevant response.

Steps to Embed User Queries:1. Process the Query:

Convert the user’s question into an embedding using the same model and method used for your document chunks. This ensures that the query and documents are in the same vector space, allowing for accurate similarity searches.

2. Retrieve Relevant Information:

Use the query embedding to search the vector database and retrieve the most relevant document chunks. This step ensures that the LLM has access to contextually relevant information.

3. Formulate a Templated Prompt:

Create a template that combines the user’s query with the retrieved document chunks. This prompt guides the LLM in generating a response that is informed by the relevant context.

This is the templated prompt we provided in our Sagemaker Wizard example:



Step 5: Use a Pre-Trained LLMI already included this as part of step 1 where you would have selected all the software you needed to deploy an LLM. But I want to take the time to highlight the importance of this step and to provide guidance on how to do so.

Your pre-trained LLM is a critical part of how well your use case works. And that’s because pre-trained LLMs have been built for different purposes.

You need to find one that performs well for your purpose.

A good place to start is by looking through the Hugging Face repository or in GPT4all.

When selecting your model you should consider:

  • Task Requirements: Ensure the model is suitable for the type of tasks you need to perform.
  • Model Size: Larger models generally provide better performance but require more computational resources.
  • Performance Metrics: Evaluate models based on benchmarks and performance metrics relevant to your use case.
We used Mistral and Gemma in our example with Sagemaker queries because they are suitable for text based queries and facilitate nuanced understanding.



Step 6: Import the Layers of the Model into your Local GPUOnce you’ve selected and prepared your pre-trained LLM, the next step is to import the model into your local GPU.

This step involves configuring the model to work efficiently with your hardware, ensuring optimal performance for your use case.



Steps to Import the Model1. Configure GPU Settings:

Ensure your system is equipped with the necessary GPU drivers and CUDA toolkit compatible with your GPU model.

Adjust parameters such as window length and batch size to optimize performance.

2. Set Parameters:

Window Length: This defines the context size the model uses for processing. A typical window length might be around 1024 tokens, but this can vary based on your specific needs and the capabilities of your GPU.

Batch Size: This determines the number of tokens processed in each batch. Adjusting the batch size helps balance memory usage and processing speed. A common starting point is 8-16 tokens per batch, but you may need to tweak this based on your hardware capabilities.

3. Import the Model Using LangChain:

LangChain provides a streamlined way to manage and run your LLM. It simplifies the process of loading and configuring your model for optimal performance.

Step 7: Append the Response and Get Your ResultsUp til now, we’ve just been setting the LLM to function. Now we need to actually run it. This step actions everything you’ve set it up to do and runs through the process we showed you earlier, in the sequence it needs to be completed in:



This is what happens:

  • User Query: The user inputs a query that the system needs to address.
  • Query Embedding: The query is transformed into an embedding and used to search the vector database.
  • Context Retrieval: The most relevant document chunks are retrieved and combined with the query.
  • Prompt Formulation: A templated prompt provides context for the LLM alongside the user query.
  • Response Generation: The LLM generates a response based on the prompt.
  • Display Result: The final response is formatted and displayed or returned to the user.

In our example, the output looks like this:



If you’re feeling overwhelmed by the steps and choices before you, I want to stress that you don’t have to navigate them on your own.

At HatchWorks AI we help build, deploy, and fine-tune AI solutions that put their business ahead. You have two options, let us build it for you or teach you ways to be more efficient as you build.



Custom software solutions built for you with Gen-DD™

Let us handle the development so you can focus on your core business.

Using our (Gen-DD) methodology, we create bespoke software solutions in a faster, more efficient, and cost-effective way.





Accelerate Your Development Process with Gen-DD™

Let us teach you how to build more efficiently with the power of AI.

We empower you to use our Gen-DD methodology, enabling your team to build software more efficiently.




Interested? Get ahead with HatchWorks AI by your side.

Key Considerations When Deploying an LLMAs with any business decision, there are a few considerations to make which will determine if, when, why, and how you will deploy an LLM.

Here’s what I would—and do—consider every time:

Hardware Requirements – Do You Need a GPU?For most LLMs, especially large ones, a powerful GPU is recommended due to its parallel processing capabilities, which significantly speed up training and inference tasks.

You can use a CPU but the performance will be considerably slower. For smaller models or less intensive tasks, a high-performance multi-core CPU might suffice.

Model Selection – Do You Go Open or Closed?Honestly, it’s up to you but I find pre-trained models faster and easier to tailor and implement to my needs.

A pre-trained model is trained on vast amounts of data and can be fine-tuned to specific tasks with relatively less effort. They are available from repositories like Hugging Face.

?? Need more help making the choice between open or closed? Check out our article: Open-Source LLMs vs Closed: Unbiased 2024 Guide for Innovative Companies

Keeping Things Private – Data Security ConcernsSecurity is one of the major reasons a business would deploy locally.

It keeps data in-house, effectively reducing risks associated with data breaches and ensuring compliance with data protection regulations. And it means complete control over the model.

This is in contrast to using a cloud-based model which provides ease of scalability and reduces the burden of hardware maintenance but raises concerns about data security and potential compliance issues.

Costs – How Much for How Long?I think of the costs of deploying an LLM in two phases: initial set-up and ongoing operations.

Initial Setup Costs:

  • Significant upfront investment in purchasing powerful GPUs, ample RAM, and high-speed storage.
  • Costs for necessary software licenses and tools required for setting up and running the LLM.
Ongoing Operational Costs:

  • High-performance hardware, especially GPUs, can consume considerable power, leading to increased electricity bills.
  • Regular maintenance and potential upgrades for hardware to ensure optimal performance.
  • Depending on the software and models used, there may be ongoing licensing costs.
Even with these costs, local deployment offers long-term savings by eliminating the need for recurring subscription fees associated with cloud services and allows you to manage and predict your costs more accurately over time.

FAQs for Deploying an LLM

What are the benefits of deploying your own LLM?Deploying an LLM:
  • Keeps sensitive data in-house, reducing the risk of data breaches.
  • Gives greater control over the model and its environment, allowing for extensive customization and fine-tuning.
  • Offers potentially faster response times as there is no dependency on external servers.
  • Reduces long-term costs associated with recurring cloud service fees, especially for high-frequency usage.


What are the hardware requirements for deploying an LLM?You’ll need at least 16GB RAM (32GB+ recommended), an SSD with 20GB+ free space, and preferably a GPU for better performance.

What software do I need to deploy an LLM?Python: The primary language for deploying LLMs, with version 3.8 or higher.
Libraries: Key libraries such as torch for PyTorch, transformers for handling LLMs, and faiss-cpu for similarity search.
CUDA and CuDNN: Necessary for leveraging GPU acceleration.
Development Environment: An IDE like Visual Studio or PyCharm, and version control with Git.
Vector Database: Tools like Epsilla for managing document embeddings.
Inference Engine: Llama CPP for running the model, compatible with both GPU and CPU.

How do I choose the right pre-trained model?Choosing the right pre-trained model depends on your specific task requirements. Models from the Hugging Face repository offer a wide range of options tailored for different tasks such as text generation or question answering.

Larger models generally provide better performance but require more computational resources.

You also may want to consider models with active communities and good documentation can facilitate easier troubleshooting and support.

What are the security implications of deploying an LLM?Deploying an LLM locally ensures that sensitive data does not leave your premises, which significantly reduces the risk of data breaches.

This makes it easier to comply with regulatory requirements such as GDPR or HIPAA, as data is kept in-house.

However, it’s important to implement robust security protocols, including encryption, secure access controls, and regular security audits, to protect both the model and the data it processes.

Can I deploy an LLM on a CPU, or do I need a GPU?While it is possible to deploy an LLM on a CPU, the performance will be significantly slower compared to using a GPU. This might be sufficient for smaller models or less intensive tasks.

However, for most LLMs, especially those used for high-frequency or resource-intensive applications, a powerful GPU is recommended due to its parallel processing capabilities, which significantly speed up training and inference tasks.