Update: Sketchy narratives around DeepSeek mount as more Chinese firms release advanced open source models
While the firm's motives and claims have been called into question, open source community and industry reception of DeepSeek models is overwhelmingly positive
Image borrowed from The Information
This is the second in a series of in-depth takes on DeepSeek and also touches on the issue of US China competition in AI. For a broader look at AI issues in 2025 to watch, see AI Decrypted 2025. Also see a podcast on DeepSeek I participated in with the Asia Society Policy Institute (ASPI) this week. Note in particular the excellent comments on this podcast by Jenny Xiao, a partner at Leonis Capital, and an experienced investor in China in the high tech sector. Also, for a great glossary around these issues, explaining briefly what terms like distillation, MoE, and test-time compute mean, see this handy glossary from The Information.
The fallout continues from the sudden recognition that Chinese startup DeepSeek has produced an advanced AI model for reasoning using less compute than leading AI companies and released under an MIT license—meaning it is “open source” or, more precisely, open weight. There are several different narratives prevalent around DeepSeek, a company spun off from Chinese quant firm High-Flyer Capital Management that is focused on developing artificial general intelligence (AGI). This week, new White House AI and crypto czar David Sacks claimed that OpenAI is looking into whether DeepSeek used distillation techniques, possible under a license to use the OpenAI GPT API, to help train the modes it has released in short order since December 2024, V3 and several flavors of R1, a reasoning model comparable to ChatGPT4o o1. The Trump White House has indicated an investigation of DeepSeek is underway, though President Trump seemed to endorse the innovation at DeepSeek and the open source/weight movement in comments he has made on the subject. Significantly, despite these concerns, Microsoft, which apparently alerted OpenAI to potential anomalous activity related to the GPT-4 API, is hosting R1 on its Azure AI Foundry cloud services platform.
The commentariat has broken into two broad camps around the significance of DeepSeek:
Those welcoming DeepSeek’s contributions to AI development. Open source advocates praised the release of DeepSeek’s models as valuable contributions to the open source/weight community. This group includes one of the three founders of machine learning, Yann LeCun, former Intel CEO Pat Gelsinger, and Perplexity CEO Aravind Srinivas, among others. This group tends to be skeptical of the effectiveness and wisdom of US export controls on AI hardware and model weights.
DeepSeek has profited from open research and open source (e.g. PyTorch and Llama from Meta)…They came up with new ideas and built them on top of other people’s work…Because their work is published and open source, everyone can profit from it…That is the power of open research and open source.—Yann LeCun
We’ve shipped many things in Perplexity, but integrating DeepSeek R1 with search is truly a phenomenal experience, seeing the model think out loud like an intelligent person, reading hundreds of sources, with sleek UX.—Aravind Srinivas
Those questioning DeepSeek’s claims and peddling conspiracy theories about the firm. This camp, which includes a host of China critics, VCs benefitting from the “China threat” narrative, and other think tank supporters of US export controls, is alleging that DeepSeek actually has access to many more GPUs that it claimed to have used for training V3, including large numbers of GPUs now restricted for export to China under US export control regulations. There is no credible evidence of this yet, though it could emerge. This group also includes those claiming the timing of the release of DeepSeek’s R1 reasoning model was directed by the Chinese government and timed for political effect, similar to Huawei’s release of its Mate 60 smartphone during the August 2023 visit to China of former Commerce Secretary Gina Raimondo. This group also claims that US export controls are effective and should be strengthened. The allegations of Chinese government involvement in the timing of the release thus far appear purely speculative, and represent some of the worst of the many bad takes on DeepSeek.
Before a deeper dive into these groups, let’s first take a quick look at some of the flawed comparisons and conclusions being drawn around this issue:
DeepSeek and Meta/OpenAI. DeepSeek is a small company—just over 100 people, primarily software and hardware engineers. While it validates Meta’ approach to AI, the firm is not competing with Meta, a much bigger firm that can provide support for applications built on its open source model family Llama. DeepSeek is not trying to do this at scale and this is not part of its business model; nor is it competing in any way with OpenAI, which is focused on enterprise deployments and support.
DeepSeek and the Chinese government. There is zero evidence that the firm has ties to the government or has benefited in any way from Chinese industrial policy around the AI stack. The firm is self-funded by hedge fund founder Liang, and its primarily focus is on pushing the bounds of innovation around generative AI as part of a broader focus on getting to AGI.
Training costs. Despite many media outlets citing the figure of around $6 million for training its V3 model, the paper on V3 is very careful to point out that this figure did not represent the total investment in developing this model. Blithe comparisons between training costs flying around all week have been less than helpful in clarifying the issue. There is no standard generally accepted method for calculating the costs of training advanced models.
Conspiracy theories around DeepSeek abound
The conspiracy theories swirling around over the past week range from plausible to outright ludicrous.
Even before the market furor around DeepSeek broke this week, as noted in my previous post, some industry commentators were claiming that DeepSeek had access to a much larger cluster of GPUs than the firm listed in its paper on the V3 model. See here for more detail on this.
It remains unclear what these claims are based on, and no credible evidence supporting them has surfaced. This week, an Nvidia spokesperson issued a statement that appeared to dismiss the idea that DeepSeek was using a large secret cluster of Nvidia GPUs acquired by circumventing US export controls or smuggling, as some have alleged.
Attributed to Nvidia spokesperson:
“DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”
In addition, during the week, there were some shadowy X accounts, one of which quickly disappeared, that claimed the US State Department was working with Nvidia to “track all the dies going from TSMC to China from 2023-2025.” As I noted in my previous post on DeepSeek, it is likely that Huawei was able to acquire a large number of Ascend 910B dies from TSMC via a third party, but there is no evidence that export-controlled Nvidia GPUs or dies have been diverted or smuggled into China in large numbers over the past two years. This is a complex issue to run to ground because of the changing thresholds for controls on Nvidia GPUs and the availability of downgraded versions of Nvidia GPUs designed specifically for the China market. The H800s used by DeepSeek to train V3 fall into this category. A video from a Chinese middleman this week claimed that acquiring some number of H100 and even new B200 GPUs was not particularly difficult. Still, DeepSeek management appears to be concerned about not directly skirting US export control restrictions and has been open about its use of A100 and H800 GPUs that were acquired “legally”, i.e., before they were covered by controls.
Proponents of this particular version of events are also eager to assert that the US export controls have been effective in slowing the ability of Chinese companies to develop advanced AI models. Hence, they are eager to demonstrate that in order to achieve an advanced open source model, a company like DeepSeek would necessarily have had to cheat somehow. This narrative holds that an effective regime of export controls has forced companies like DeepSeek to find ways to circumvent them, and asserts that the full effect of the updated controls released in October 2023—which included the H800s used by DeepSeek under new computing thresholds that restricted the sale of these GPUs—have yet to be felt. DeepSeek CEO Liang has been quoted to the effect that the firm faces challenges in gaining access to more advanced GPUs to train its models.
Yet another conspiracy theory is that DeepSeek is actually a “Trojan Horse”, comparable but worse than TikTok, that is designed to “suck up data and send it back to Beijing.” This line is being pushed by people such as investor Bill Ackman. Perhaps the most risible assertion comes from Anduril CEO Palmer Luckey, whose company’s business model is predicated on hyping the “China threat.” Luckey asserted on X that DeepSeek is in fact a “Chinese psyop” intended to strike fear in the the US about technology leadership.
Other commentators are speculating that the timing of the release of the R1 model was done with some type of political calculus. What exactly the political message might be remains unclear; nor is there evidence that the release was intentionally timed to create a particular impact—let alone at the behest of the Chinese government, as some have asserted. Greg Allen from CSIS has asserted that the timing was similar to Huawei’s release of the Mate 60: “The story is the same in both cases, which is that the technology is legitimate and impressive, but the timing is politically motivated,” said Allen. “[This] is an attempt to try and change the political narrative in a new presidential administration — to try and say export controls have already failed, we should give up on them. That is the plan here.”
This assertion, in addition to being complete conjecture, fails even the most basic test of probable causality. DeepSeek first released its V2 model on May 6, 2024. Few outside the open source AI community noticed, even though the technical report contained much of the detail of approaches also used for V3, released on December 27. DeepSeek has been embedded in the very competitive domestic Chinese AI sector for some time, so the release of an updated model like V3 after seven months was not unusual. What surprised people was the release on Jan 20 of the firm’s R1 reasoning model, less than a month after releasing V3. But any serious examination of the firm, its management, and its operating style suggests that it is focused on the technology and on gaining acclimation within the open source community. The politics of US China tech competition played no role in the timing of the release. My discussions with people close to the company confirm this.
In addition, those asserting that the efficacy of US export controls will increase over time are pushing against a strawman. Of course the controls are effective, as Liang has noted and as evidenced by President Xi’s having raised them numerous times during meetings with former President Biden, including in November at a meeting on the sidelines of APEC. The central issue is really whether the controls make sense in the first place, and whether the justifications around controlling advanced AI development in China account for what is the already significant and well-documented collateral damage to US technology leadership—or for other unintended consequences, such as spurring innovation at DeepSeek, along with the entire Chinese AI ecosystem and semiconductor industry, as I have written extensively about. Among analysts attuned to these impacts, there is strong recognition of the clear limitations of export controls. Understanding their long-term impact on critical issues such as US China relations, the status of Taiwan, and the ability to gain global agreement on regulation around the clear risks of advanced AI model availability is essential—particularly as the DeepSeek drama blindingly highlights the need for the inclusion of China in these governance efforts.
Instead, we have seen some US technology leaders double down on the need for export controls. This creates considerable dissonance with the message leading US AI firms have put out around support for smart regulation of frontier models. Excluding Chinese AI companies—including DeepSeek and the host of other Chinese AI firms from large players Alibaba, Bytedance, Tencent, and Huawei to startups Minimax and Moonshot—from global efforts to gain support for responsible scaling policies appears highly counterproductive. Some US critics in Congress are already suggesting even more drastic measures to address the challenge of DeepSeek. The most extreme response was a bill proposed this week by China critic Senator Josh Hawley (R-MO), who has introduced the Decoupling America’s Artificial Intelligence Capabilities from China Act that would ban exports or imports of AI technology from China, restrict US companies from conducting R&D in China, and prohibit all investments in Chinese AI companies.
In addition, Commerce Secretary designate Howard Lutnick, during confirmation hearings this week, touched on export controls and DeepSeek. On the impact of the release of DeepSeek’s advanced models, Lutnick did not elaborate on specific new export controls under consideration but expressed frustration with GPU leader Nvidia and open source model leader Meta, claiming the firms were “aiding China’s development of new artificial intelligence technologies”. He also stated that “we need to stop helping them” and claimed, without evidence, that DeepSeek bought “tons of Nvidia chips” and “found their ways around [export controls]”. As the Nvidia spokesperson noted, the GPU leader currently holds the view that DeepSeek’s access to Nvidia GPUs has been fully compliant with US export control laws.
Open source community welcomes contributions from DeepSeek
On the other hand, the reaction from the open source community to DeepSeek has been overwhelmingly positive. Already there is a growing ecosystem developing around DeepSeek as a result of the open sourcing of the firm’s models V3, R1, and text to video model Janus. Even US open source leader Meta this week said it is considering testing DeepSeek in its generative AI tools for advertisers, according to a Meta employee. “We regularly work to understand all available models,” a spokesperson for Meta said in a statement. Meta chief AI scientist LeCun this week touted a version of Llama-3.3 fine tuned (distilled) with outputs from DeepSeek R1 available on GroqCloud. The Information also reported that investment bank Goldman Sachs is considering using R1, but needs to put the model through security testing.
Elsewhere around the robust open source ecosystem, opinions range from the highly positive to the euphoric. Thomas Wolf, the CSO of leading open source hosting firm Hugging Face, highlighted the importance of the open sourcing of DeepSeek’s models this week, noting that there are already 500 derivative models based on DeepSeek. Because DeepSeek shared a lot of details about how the team trained the model, it will be easier for researchers to continue to build on top of DeepSeek. The open source AI ecosystem is fully welcoming the arrival of DeepSeek and the open source community does not know any borders, according to Wolf. Hugging Face has also partnered with Dell to host DeepSeek R1 on premises, a potentially huge development in the spread of R1 on US systems.
Other key players in the US AI service ecosystem such as Perplexity are also hosting the RI model in the US and incorporating DeepSeek’s offering into their platform. Perplexity CEO Aravind Srinivas has been extremely positive about DeepSeek R1 and stressed that because the model is hosted on Perplexity’s US based platform, there are no issues about data going back to servers in China, or about censorship of DeepSeek’s responses to questions about sensitive issues. Media outlets have focused on DeepSeek’s response to questions about Xi and Tiananmen Square, but Perplexity’s model shows that certainly within the open source community these issues appear insignificant compared to the utility of the model for a range of end uses. In addition, given the costs of running R1, the model is apparently “spreading like wildfire” in developing markets, including Africa.
This highlights one of the most important regulatory issues around the DeepSeek release of V3, R1, and Janus: how governments will control the release of advanced models that will be widely downloaded and operated at low cost around the globe, including potentially by malicious actors. This has long been an issue with open source/weight models, but the US AI Diffusion Rule issued late in the Biden administration explicitly stated there would not be controls on open source/weight models. The DeepSeek experience may change the calculus on this in Washington. However, there will be major industry pushback on any attempt to impose controls on open source/weight models—let alone for the US government to order Github, Hugging Face, Azure, Perplexity, and hundreds of other hosting sites to take down DeepSeek’s models, which have already also been downloaded and installed on thousands of home and office computers.
The issue of responsible AI model scaling and how to handle national security risks around advanced models will be the subject of official meetings and side events in Paris at the AI Action Summit on the week of February 10, alongside discussions of all the benefits AI will bring. The DeepSeek case and the broader issue of open source/weight models—French AI leader Mistral has open sourced its leading models—will be the talk of the town.
As the Chinese New Year break neared, other Chinese AI firms besides DeepSeek released new advanced AI models, underscoring the competitiveness of the sector in China. Large technology platforms including Alibaba and Tencent, along with startups such as Moonshot and Minimax, released reasoning models similar to DeepSeek’s R1. Alibaba released a new version of its Qwen 2.5 model, which the firm has claimed exceeds the performance of advanced models from OpenAI, Meta, and DeepSeek. Social media and payments giant Tencent also released its Hunyuan3d-2.0 “advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets” last week—also open source.
Startup Minimax released several open source models in early January, which the firm claimed outperformed Google’s Gemini 2.9 in some key benchmarks. Moonshot AI this week launched an updated version of the model underpinning its widely popular Kimi chatbot, touting the performance of its new Kimi k1.5 multimodal model as better than OpenAI and Anthropic models. China’s AI sector is highly competitive, and firms are increasingly embracing open source/weight releases of advanced models and benchmarking them against the industry leaders. In this context, the focus on DeepSeek may seem misplaced: US regulators have not likely anticipated the rapid release of more Chinese open source models that are very capable, and likely to drive even further uptake in the US on a wide range of platforms.
The next installment of this series will be an update on the allegations around DeepSeek use of OpenAI’s API, the question of the firm’s access to large clusters of Nvidia GPUs, and these events’ ongoing impact on the open source model/weight movement and potential US government responses.
Great piece. So much of the anti deepseek narrative is about signaling the “right” view with clear self interest at play