The AI community is excited about DeepSeek R1, a new open source reasoning model.
The model was developed by Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on several key benchmarks (metrics, standards, or test suites), but operates at a fraction of the cost.
“This could be a truly equalizing breakthrough, great for researchers and developers with limited resources, especially those in the Global South,” says Hancheng Cao, assistant professor in information systems at Emory University.
DeepSeek’s success is all the more remarkable given the restrictions faced by Chinese AI companies, whose imports of cutting-edge chips are controlled by the US. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, sanctions appear to be pushing startups like DeepSeek to innovate in ways that prioritize efficiency, resource pooling, and collaboration.
According to Zihan Wang, a former DeepSeek employee and current computer science doctoral student at Northwestern University, to create R1, DeepSeek had to rework its training process to reduce pressure on its GPUs, a variety released by Nvidia for the Chinese market that is limited in performance to half the usual speed of its flagship products.
The DeepSeek R1 has been praised by researchers for its ability to handle complex reasoning tasks, particularly in mathematics and coding. The model employs a “chain of thought” approach similar to that used by ChatGPT o1, which allows you to solve problems by processing queries step by step.
Dimitris Papailiopoulos, principal investigator at Microsoft’s AI Frontiers research lab, says what surprised him most about the R1 is its engineering simplicity. “DeepSeek sought precise answers rather than detailing each logical step, significantly reducing computation time while maintaining a high level of efficiency,” he says.
DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. He claims that one of them even outperforms OpenAI’s o1-mini in certain benchmarks. “DeepSeek has largely replicated o1-mini and made it open source,” Perplexity CEO Aravind Srinivas tweeted. DeepSeek did not respond to MIT Technology Review’s request for comment.
Despite the buzz surrounding R1, DeepSeek remains relatively unknown. Based in Hangzhou, China, it was founded in July 2023 by Liang Wenfeng, a Zhejiang University alumnus with a background in electronic engineering and information technology. It was incubated by High-Flyer, a hedge fund that Liang founded in 2015. Like OpenAI’s Sam Altman, Liang aims to build artificial general intelligence (AGI), a form of AI that can match or even beat humans at a range of tasks.
Training large language models (LLMs) requires a team of highly trained researchers and substantial computing power. In a recent interview with Chinese media outlet LatePost, Kai-Fu Lee, a veteran entrepreneur and former head of Google China, said that only “top-tier players” typically get involved in building foundational models like ChatGPT as it is resource-intensive. The situation is further complicated by US export controls on advanced semiconductors. High-Flyer’s decision to venture into AI is directly related to these constraints, however. Long before the anticipated sanctions, Liang acquired a substantial stockpile of Nvidia A100 chips, a type now banned from export to China. Chinese media agency 36Kr estimates the company has more than 10,000 units in stock, but Dylan Patel, founder of AI research consultancy SemiAnalysis, estimates it has at least 50,000. Recognizing the potential of this stock for AI training is what led Liang to create DeepSeek, which was able to use them in combination with lower-powered chips to develop its models.
Tech giants like Alibaba and ByteDance, as well as a handful of startups with deep-pocketed investors, dominate the Chinese AI space, making it challenging for small or medium-sized companies to compete. A company like DeepSeek that has no plans to raise funds is rare.
Zihan Wang, a former DeepSeek employee, told MIT Technology Review that he had access to abundant computing resources and the freedom to experiment when he worked at DeepSeek, “a luxury that few recent graduates would have at any company.”
In an interview with 36Kr in July 2024, Liang said that an additional challenge Chinese companies face beyond chip sanctions is that their AI engineering techniques tend to be less efficient. “We [most Chinese companies] have to consume twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continually close these gaps,” he said.
But DeepSeek has found ways to reduce memory usage and speed up the calculation without significantly sacrificing accuracy. “The team loves turning a hardware challenge into an innovation opportunity,” says Wang.
Liang himself remains deeply involved in the DeepSeek research process, running experiments with his team. “The entire team shares a collaborative culture and dedication to hardcore research,” says Wang.
In addition to prioritizing efficiency, Chinese companies are increasingly adopting open source principles. Alibaba Cloud has launched more than 100 new open-source AI models, supporting 29 languages and serving multiple applications including coding and mathematics. Similarly, startups like Minimax and 01.AI have made their models open source.
According to a white paper released last year by the Chinese Academy of Information and Communications Technology, a state-affiliated research institute, the number of large AI language models in the world has reached 1,328, with 36% originating in China. This positions China as the second largest contributor to AI, behind the United States.
“This generation of young Chinese researchers strongly identifies with open source culture because they benefit greatly from it,” says Thomas Qitong Cao, assistant professor of technology policy at Tufts University.
“US export controls have essentially backed Chinese companies into a corner where they have to be much more efficient with their limited computing resources,” says Matt Sheehan, an AI researcher at the Carnegie Endowment for International Peace. “We will probably see a lot of consolidation in the future related to the lack of computing.”
This may have already started to happen. Two weeks ago, Alibaba Cloud announced that it had partnered with Beijing-based startup 01.AI, founded by Kai-Fu Lee, to unite research teams and establish a “large-scale industrial modeling laboratory.”
“It is energy efficient and natural for some kind of division of labor to emerge in the AI industry,” says Cao, the Tufts professor. “The rapid evolution of AI requires agility from Chinese companies to survive.”
( fontes: MIT Technology Review)
