The AI sector is, and always will be, a high-risk environment that has been dominated by big tech companies that put in hundreds of millions of dollars for hardware and infrastructure. One more possible ‘paradigm shifter’ in AI has emerged-DdeepSeek. Vastly dissimilar paths have been taken to create a model by DeepSeek that exists to confront Nvidia in the arena of AI computing and possibly shift the ground under the industry itself.
Untold Money Were But Until Now Spent on AI Training
At present, training an AI algorithm such as GPT-4 or Claude does represent a terrific expenditure. OpenAI, Anthropic, and others spend $100 million or more only on computation. These companies depend on huge data centers containing thousands of high-end GPUs, each costing around $40,000. Indeed, the energy requirement for training AI begins to be a power plant unto itself.
But with DeepSeek, everything changed. They flipped the premise: why not do this with $5 million? And they really did it.
DeepSeek’s Radical Approach
How did they pull it off? From the ground up-they turned the whole AI development on its head. Previous AI models performed calculations in a rather inefficient way, storing and processing information with utmost precision-and we mean 32 decimal places when only 8 would do. DeepSeek lowered memory engagement by 75 percent without compromising accuracy.
Their “multi-token” system is a real game changer. While conventional AI models process words one at a time (“The…cat…sat”), DeepSeek processes whole phrases en bloc. The result is two times the speed with 90% accuracy. That means a lot when working with billions of words.
Expert Systems, Not Monolithic Models
DeepSeek has also taken an “expert system” approach. Instead of one monster AI model trying to know everything (just like one human could be an acquaintance, a doctor, or a lawyer), there are sets of smaller AIs-whose combined knowledges can solve a problem.
Whereas traditional models employ all of their parameters at once, DeepSeek’s model is composed of specialized experts that assist when necessary. With 671 billion parameters as a base, DeepSeek only activates 37 billion at any given moment, making it more efficient than ever.
Results That Seem Almost Magical
It is almost unbelievable the results being achieved by the DeepSeek model:
- Training costs have been reduced since: $100M to $5M
- API costs are reduced by 95%
- GPUs used were down from 100,000 to 2,000
- Can now run on gaming GPUs instead of expensive data center hardwareAnd the chief selling point? Open-source. The code and technical papers can be checked by anyone. This is not magic; this is good engineering.
The Consequence—A Serious Threat for Nvidia and Big Tech
DeepSeek threatens to upset the Nvidia business model. Their business model relies on selling massively overpriced GPUs with huge margins. However, if AI companies are achieving state-of-the-art performance on common gaming GPUs, then the Nvidia grip on the AI space may loosen.
A Disruption for the Ages
DeepSeek is doing a classic job as a disruptor; instead of optimizing established methods, it is questioning core assumptions in AI development. The methods being adopted prove that just throwing GPUs at the problem is not the only way forward.
What this temporarily means
- Greater accessibility for AI development
- Massive competition diminishes monopolistic power
- Hardware requirements (and costs) collapseArtificial Intelligence—Fast Changing
OpenAI and Anthropic shouldn’t sleep on this. They must be engaged in some parallel effort to integrate similar efficiency tricks. However, the efficiency genie is out of the bottle. There’s no way back to the “just buy more GPUs” strategy.