In today’s fast-paced world of technological breakthroughs, the DeepSeek-R1 has emerged as a revolutionary force in AI model development. With its remarkable reasoning and mathematical capabilities, it has garnered significant attention for its innovative approach and cost-efficiency. Chinese AI chatbot DeepSeek’s newly unveiled R1 reasoning model has shaken up Big Tech, but what exactly is fueling the buzz? Let’s break it down.
Brief Overview of DeepSeek-R1
The research paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (1) introduces two first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero is trained using large-scale reinforcement learning (RL) without prior supervised fine-tuning, resulting in notable reasoning abilities. However, it faces challenges such as poor readability and language mixing. To address these issues and further enhance reasoning performance, the authors developed DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. This approach enables DeepSeek-R1 to achieve performance comparable to OpenAI's o1-1217 model on reasoning tasks.
To support the research community, the authors have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (ranging from 1.5B to 70B parameters) distilled from DeepSeek-R1, based on Qwen and Llama architectures. These resources aim to facilitate further exploration and development in enhancing reasoning capabilities of large language models through reinforcement learning.
Key Performance Metrics Comparisons (Benchmarking)
The paper demonstrates that DeepSeek exhibits exceptional performance in reasoning and mathematical tasks, showcasing its potential in these domains. However, in the critical area of code generation, it still lags behind the OpenAI o1-0912 model, highlighting room for improvement in programming-related capabilities (Image 1).

This benchmark comparison reveals comparable outcomes, highlighting the performance of Silicon Valley models (purple) against DeepSeek R1 (red) high scores. While DeepSeek excels in language reasoning and mathematics, it has yet to surpass the American models in programming capabilities, emphasizing a key area for future enhancement (image 2).

Summary of Capabilities of DeepSeek-R1
An impressive achievement, particularly in the realm of distillation and the performance optimization of smaller models. While it surpasses Silicon Valley in mathematical capabilities, it lags behind in generating computer code. This marks a significant advancement in AI research, though it is but one step in a broader journey. Silicon Valley remains well-positioned to spearhead future innovations. Nevertheless, China has made it clear that it is fully committed to advancing AI development and will not be left behind in this transformative field.
The Effect on NVIDIA and other Chip Makers
The release of the DeepSeek models, which have been available for some time, initially garnered limited attention due to the established nature of reinforcement training. However, the release of a research paper on January 20, coinciding with inauguration day, captured widespread interest—largely due to claims about the minimal costs involved in creating the model. Despite the buzz, FSG believes the market overreacted to the news, and we expect NVIDIA’s stock price, along with that of other chipmakers, to rebound swiftly.
NVIDIA Corporation is set to report earnings on February 26, 2025, covering the fiscal quarter ending January 2025. We anticipate a recovery in NVDA stock prices leading up to or shortly after the earnings announcement—barring any disruptive headlines, such as claims of DeepSeek achieving AI advancements using mobile phone chips or pocket calculators. Regardless, NVIDIA remains well-positioned with the most advanced chips on the market, many of which are yet to fully penetrate global markets, ensuring sustained demand for its cutting-edge technology.
The Effect on AI Development
DeepSeek advances serve as a testament to the exponential progress in AI development. Its emergence underscores the rapid pace at which innovation is transforming the field, setting the stage for even greater breakthroughs in the year ahead. DeepSeek is poised to exert significant pressure on American pricing models, such as OpenAI's, by offering its services entirely free for most use cases. In contrast, regular ChatGPT users may need to subscribe to its paid tier at $20 per month, highlighting a stark difference in accessibility and cost structure that could reshape market dynamics.
Programmer's API access for DeepSeek-RI starts at $0.14 for one million tokens or roughly 750,000 words. DeepSeek's latest model is reportedly closest to OpenAI's o1 model, priced at $7.50 per one million tokens. That's a pretty big disparity in pricing. We leave you with the following voices.
Ethan Mollick, a Wharton professor, wrote on X that he's "not sure why people assume this will make compute less valuable," adding: "More efficient models mean that those with compute will still be able to use it to serve more customers and products at lower prices & power impact." (2)
Gelsinger posted Monday on X to suggest that the market's assumptions were wrong. He said that instead of reducing demand, making computing "dramatically cheaper" and more efficient to use — as DeepSeek appears to have done — "will expand the market for it." (3)
Microsoft's Satya Nadella. “Jevons paradox strikes again!” Nadella wrote on LinkedIn Monday, referring to a theory that increased efficiency in a product’s production drives increased demand. “As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of.” “To see the DeepSeek new model, it’s super impressive in terms of both how they have really effectively done an open-source model that does this inference-time compute, and is super-compute efficient,” Nadella said Wednesday. “We should take the developments out of China very, very seriously.” (4)
References
(1) The research paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"
Comments