Comments on DeepSeek's "clever" but "overblown" capabilities need context
Why are private US companies assessing "national security risks" of foreign open source models?
Recent comments from Anthropic’s Jack Clark at the Hill and Valley Forum that DeepSeek technology capabilities are “clever” but “overblown” require some context, as does his comment that Anthropic tested DeepSeek’s models and determined that they were “not a national security threat”. Clark also implied that if DeepSeek had access to “arbitrarily large amounts of compute,” Anthropic’s conclusions would be different, presumably meaning that DeepSeek’s capabilities would go from “overblown” to “serious” and raise national security concerns. Of course, any “hype” around DeepSeek came not from the firm itself, but from western commentators and AI firms, particularly in the wake of the massive January 20 US stock market “adjustment.”
First, a comparison of the two companies is revealing. While getting to AGI is a stated goal of both firms, they have very different business models, significantly different numbers of personnel, and wildly divergent investment numbers, valuations, and access to compute. A glaring difference is that Anthropic is developing proprietary models, while DeepSeek has not only open sourced its models/weights, but has also published numerous detailed research papers about its “clever” technology innovations.
Second, the fact that Anthropic has tested DeepSeek’s models against national security-related concerns—presumably benchmarks around the models’ ability to help a malicious actor design better chemical, biological, radiological, or nuclear weapons (CBRN), or engage in more sophisticated cyber operations—is significant. We do not know what these tests consisted of, nor how conclusions were reached on these national security concerns. Anthropic has been a leader in implementing so-called responsible scaling laws, which mandate increased levels of testing against CBRN and cyber risks as models become more capable. Presumably it has robust internal methods for testing any model against these scaling parameters.
That Anthropic is doing CBRN/cyber testing on DeepSeek is a good thing, and highlights the need for collaboration between industry and the US and China AI Safety Institutes, ideally under the Bletchley Park Process or the Singapore Consensus process. The Singapore Conference on AI (SCAI) in April, designed by safety thought leader Max Tegmark to rebuild momentum lost by the AI safety community at the Paris AI Action Summit, included attendees from both Anthropic and OpenAI. The excellent report put out by SCAI proposes areas of mutual interest that could form the basis for collaboration by governments and corporations globally. Using a relevant analogy, the aviation industry, where competitors like Boeing and Airbus cooperate on safety standards, the report identifies specific technical areas where information sharing among industry and governments would benefit all parties. These areas include: standardized risk thresholds and audits, third party evaluation infrastructures, benchmark development and metrology, risk capability assessments, downstream impact forecasting, and loss-of-control research risk.
Unfortunately, because of complicated Chinese government and industry dynamics, which I will cover in a later post, no leading Chinese AI developers, including DeepSeek, were able to participate in this conference. Going forward, encouraging leading Chinese AI developers to participate in whatever global AI safety process emerges as the most effective will be crucial. Presumably, now that Anthropic has determined that DeepSeek’s models are not a national security threat, Clark and CEO Dario Amodei will support the inclusion of leading Chinese AI developers in these fora. The issue will be discussed during the Shanghai World AI Conference in July that I will be attending.
AGI and compute governance
Leading US proprietary model developers are also monitoring leading Chinese model developers’ ability to make progress towards AGI, while supporting export controls designed to limit the ability of companies like DeepSeek to develop so-called frontier models. Proponents of export controls on GPUs and semiconductor manufacturing technology thus are eager to point to or manufacture evidence of the controls “working” to slow the ability of Chinese AI developers to keep pace, given the mounting costs of this effort, which I have covered here.
Compute governance is the fundamental issue at play here. The disparity in access to compute between DeepSeek and leading US model developers is currently large, and DeepSeek’s CEO Liang Wenfeng has spoken about this issue as a major constraint on the firm’s ability to develop more advanced models. As I have noted previously, access to advanced GPUs in China in generally is a complex mix of options currently.
At one level, the comparison (see below) between the two companies and their overall number of personnel, access to investment, access to compute, etc., appears to be a whoppingly one-sided view, as Anthropic’s access to all of the key inputs vastly exceeds that of DeepSeek, a less-than-two-year-old company with a very limited business model incapable of supporting enterprise deployments of its models.
While Anthropic has access to something like 400,000 Tranium2 AI optimized ASICs via AWS, and likely tens of thousands of H100 equivalents in addition, DeepSeek’s access to advanced compute remains unclear. One industry observer claims that DeepSeek has 50,000 H100 equivalents, constituting “at least” 10,000 H100s, 10,000 H800s, 30,000 H20s, and 10,000 A100s. DeepSeek and High Flyer Capital have acknowledged purchasing 10,000 A100s prior to US export controls making this “illegal” in late 2022. DeepSeek has acknowledged using several thousand H800s to train its V3 and R1 models. But, again, there is no evidence DeepSeek has access to any H100s, and as unbiased industry observers such as Ben Thompson and I have noted, most of DeepSeek’s innovations were predicated on access to less capable GPUs. If DeepSeek has purchased H20s, not restricted at the time, they are almost certainly being used for inference and not training. DeepSeek is also relying on other cloud partners such as SiliconFlow for inference as part of the growing DeepSeek Ecosystem.
DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.—Ben Thompson, Stratechery
However, at another level, focusing only on advanced compute access now in terms of company-specific AI infrastructure, such as the clusters of tens of thousands of GPUs in AI data centers run by the leading US model developers, may not provide a meaningful comparison when it comes to China. First, leading AI developers in China including Alibaba, Bytedance, Tencent, and Baidu have access to considerably larger levels of advanced compute than DeepSeek. Hopefully Anthropic has also tested Alibaba’s Qwen models for “national security”-related risks. Second, going forward, DeepSeek is likely to have access to much more significant amounts of advanced compute provided by what I would call the “DeepSeek Ecosystem.” DeepSeek will also have access to innovations from across the Chinese AI ecosystem as more Chinese firms adopt open source/weight model approaches, and share innovations. For example, Tencent recently shared an optimization with DeepSeek that boosted the performance of DeepSeek’ DeepEP communications library.
Already DeepSeek is working with domestic AI hardware leader Huawei, and will likely have access to virtually unlimited compute power in the coming months. Whether Huawei’s Ascend family of GPUs—integrated into clusters such as the Ascend 910C-based CloudMatrix 384 and follow-on systems integrating the Ascend 910D, plus silicon photonics and other interconnectivity optimizations—coupled with the ability of DeepSeek and other Chinese AI developers to transition to a domestic AI software stack and optimize it (as DeepSeek has done with Nvidia hardware), will determine the degree to which DeepSeek and other Chinese AI leaders can move beyond “clever” to constituting a real “national security risk” in terms of AGI.
Finally, the issue of DeepSeek and its capabilities will be returning shortly to the spotlight of the international open source/model weight community, with the release of DeepSeek’s V4 and R2 models. Despite much speculation about the timing of this release compared to updates from other major players, DeepSeek will release the model when it is ready, and likely with much additional documentation, as it has done previously—going well beyond mere open sourcing of model weights.
In addition, the firm could be the target of US export or other controls, but any moves in this direction by the US Department of Commerce will have minimal impact on DeepSeek’s roadmap towards AGI. I will explore this further in subsequent posts.
What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. Yes, this may help in the short term — again, DeepSeek would be even more effective with more computing — but in the long run it simply sews the seeds for competition in an industry — chips and semiconductor equipment — over which the U.S. has a dominant position.—Ben Thompson, Stratechery