Date
Category
Blog
Author

DeepSeek: A Wake-Up Call for the AI World

DeepSeek, a Chinese AI startup, recently sent shockwaves through the tech industry. Its AI-powered app skyrocketed to the top of the US App Store’s free download charts, coinciding with a dramatic 17% plunge in Nvidia’s stock, erasing nearly $600 billion in market value. This upheaval has sparked a crucial question: Is DeepSeek a true game-changer, or is the excitement surrounding it overblown? The reality is nuanced, but one thing is clear: DeepSeek’s ascent signals a significant shift in the AI landscape, one that will ultimately benefit the field as a whole.

Beyond the Hype: Where DeepSeek Really Stands and Who’s Behind It

While DeepSeek’s rapid rise is impressive, it’s vital to distinguish between hype and reality. Much of the excitement surrounding the company conflates the success of its popular app with the capabilities of its most advanced model, DeepSeek R1. The app, however, utilizes the V3 model, released in December. Though a solid performer, V3 currently ranks 8th on the lmarena.ai leaderboard, a widely recognized system for ranking AI language models. It trails behind established industry leaders like OpenAI’s ChatGPT 4o, Anthropic’s Claude, and Google’s Gemini. This underscores a key point: DeepSeek is a strong contender, but not yet a dominant force. Moreover, current evaluations of DeepSeek’s models primarily focus on English and Chinese. There is limited data available to assess their performance in other languages, particularly European ones, raising questions about their true global applicability.

It’s also important to understand that DeepSeek is not a small side project. They are backed and owned by High-Flyer, a Chinese hedge fund that managed assets of over $7 billion in 2020. Their team includes Olympic medalists in mathematics, physics, and informatics. Furthermore, they have substantial computational power at their disposal, with around 50,000 GPUs.

The Open-Source Revolution: How DeepSeek is Changing the Game

A significant trend in AI is the embrace of open-source by Chinese companies like DeepSeek, contrasting with Western tech giants’ proprietary models. Building on publicly available code, they’ve gained traction in the AI community despite attempts to marginalize them (e.g., Mistral excluding Qwen). This open model lowers barriers to AI development, fosters competition, and proves open-source drives innovation and adoption. This trend, where knowledge sharing is key, impacts the market and shapes AI’s future.

The Illusion of Cheap AI: Jevons Paradox at Play

It’s crucial to understand the real costs associated with both developing and deploying cutting-edge AI models like DeepSeek R1. Early reports mistakenly suggested that DeepSeek’s development costs were under $6 million. The reality is far more complex. The compute for the base model alone (excluding any Reinforcement Learning) consumed GPU hours equivalent to $5.5 million. This figure doesn’t account for the numerous ablations, smaller experimental runs, data generation, or any of the subsequent training required to create the advanced DeepSeek R1 model.

Beyond development, a common misconception is that cheaper training automatically translates to inexpensive deployment. This overlooks the significant computational resources required to run these models, especially at scale. The real DeepSeek R1 model, for example, is a massive 671B Mixture of Experts (MoE) model. It demands substantial hardware, requiring 16x 80GB H100 GPUs, each costing around $30,000. This illustrates a fundamental truth: deploying AI models for millions of users is computationally intensive and expensive. Even well-funded companies like Anthropic, with billions in resources, have to limit access for large clients due to these inherent infrastructure costs. 

This situation is a practical example of the Jevons Paradox: when a technology makes resource use more efficient, we tend to use more of it, not less, often negating the initial cost savings. The increasing demand for and complexity of AI models are likely to continue driving up deployment costs, regardless of potential efficiencies gained during the development phase.

DeepSeek’s Secret Sauce: Simplifying AI Training

DeepSeek’s technical edge comes from simplifying how AI models are trained. They’ve combined several complex methods into one streamlined process.

Here’s a quick look at some training basics:

  • Reinforcement Learning (RL): The model learns like a game, getting rewards for good actions and penalties for bad ones.
  • Supervised Fine-Tuning (SFT): Retraining a model with labeled examples to improve it on a specific task.
  • Multi-Stage Training: Training the model in phases, like leveling up.

DeepSeek-R1-Zero: The Experiment

DeepSeek-R1-Zero was trained using only RL, without labeled data. This is like learning to ride a bike without training wheels. It’s slower at first, but it skips the costly step of creating labeled datasets. This model performed surprisingly well, matching OpenAI’s O1 on some tests.

GRPO: No “Coach” Needed

Traditional RL uses a “coach” (critic) that relies on labeled data. DeepSeek used Group Relative Policy Optimization (GRPO), which doesn’t need a coach. Instead, model outputs are scored based on simple rules like coherence and fluency. The model learns by comparing its scores to others.

DeepSeek-R1: The Refined Model and Its Capabilities

To improve upon R1-Zero, DeepSeek used a multi-stage approach for R1:

  1. Foundation: Start with a base model and some basic “cold start” data.
  2. Reasoning Boost: Use pure-RL to improve reasoning skills.
  3. Self-Improvement: The model creates its own labeled data (synthetic data) from its best outputs.
  4. Knowledge Expansion: Combine this new data with other supervised data.
  5. Final Polish: One last round of RL for overall improvement.

This step-by-step process led to the DeepSeek-R1 model achieving high scores on various benchmarks. And yes, the DeepSeek R1 671B model is really good! They have also been contributing valuable work to open source and science for over two years. It’s worth noting that they have released 6 “distilled” versions of R1, which are fine-tuned Qwen and Llama models trained on 800,000 samples (without RL). The smallest of these, at 1.5B parameters, can be run locally but is not close to R1’s capabilities.

The ChatGPT Shadow: Questions of Data Origins

Whispers within the AI community suggest that DeepSeek might have used data generated by ChatGPT to train their models. While unconfirmed, this raises questions about the true independence of their development and highlights the interconnected nature of the AI world, where tracing the origins of training data can be challenging. It’s also worth noting that the hosted version on chat.deepseek.com may use your data to train new models (as per their Terms of Service).

A Silver Lining: Why the DeepSeek Buzz Matters

Despite the need for a balanced perspective on DeepSeek’s current capabilities, and a correction of certain misconceptions, the attention surrounding the company is ultimately a positive force for AI development. The increased competition spurred by players like DeepSeek is pushing everyone in the field to innovate faster. The open-source approach adopted by Chinese companies fosters a more collaborative and inclusive AI ecosystem. Lowered entry barriers mean that more brilliant minds can contribute to the advancement of AI.

While DeepSeek may not be dethroning AI giants just yet, the buzz around them serves as a crucial wake-up call. The AI landscape is evolving at a breakneck pace, and the future of the field will be shaped by increased competition, open-source collaboration, and a global race for innovation. This dynamic environment promises to drive advancements that will fundamentally reshape the technological landscape, benefiting us all.

See our available developers here.

Andrey K.

Author / Andrey K. / Senior Developer

The latest news

See all
Up