DeepSeek Unveils V4 AI Models with 1.6 Trillion Parameters, Rivaling Leading AI Systems at Lower Cost

DeepSeek Unveils V4 AI Models, Narrowing the Gap with Leading AI Systems

Chinese artificial intelligence laboratory DeepSeek has introduced two preview versions of its latest large language model, DeepSeek V4, marking a significant advancement from its previous V3.2 model and the R1 reasoning model that garnered widespread attention in the AI community.

The newly unveiled models, DeepSeek V4 Flash and V4 Pro, employ a mixture-of-experts (MoE) architecture, each featuring context windows capable of handling up to 1 million tokens. This substantial context capacity enables the models to process extensive codebases and lengthy documents within prompts. The MoE approach selectively activates a subset of parameters for specific tasks, thereby reducing inference costs and enhancing computational efficiency.

DeepSeek V4 Pro stands out with a total of 1.6 trillion parameters, of which 49 billion are active during operation. This configuration positions it as the largest open-weight model currently available, surpassing competitors such as Moonshot AI’s Kimi K 2.6, which has 1.1 trillion parameters, and MiniMax’s M1 with 456 billion parameters. In comparison, the earlier DeepSeek V3.2 model comprised 671 billion parameters. The more compact V4 Flash model includes 284 billion parameters, with 13 billion active.

According to DeepSeek, both V4 models exhibit enhanced efficiency and performance over the V3.2 model, attributed to architectural refinements. These improvements have nearly bridged the performance gap with current leading models, both open-source and proprietary, particularly in reasoning benchmarks.

The company asserts that its V4-Pro-Max model surpasses other open-source counterparts in reasoning benchmarks and even outperforms OpenAI’s GPT-5.2 and Gemini 3.0 Pro in certain tasks. In coding competition benchmarks, both V4 models demonstrate performance comparable to GPT-5.4.

Despite these advancements, the V4 models exhibit a slight lag behind frontier models in knowledge-based assessments, specifically when compared to OpenAI’s GPT-5.4 and Google’s latest Gemini 3.1 Pro. This discrepancy suggests a developmental trajectory trailing state-of-the-art models by approximately three to six months.

It is noteworthy that both V4 Flash and V4 Pro are designed exclusively for text processing, lacking support for multimodal inputs such as audio, video, and images—a feature present in many closed-source models.

A significant advantage of DeepSeek V4 lies in its cost-effectiveness. The V4 Flash model is priced at $0.14 per million input tokens and $0.28 per million output tokens, offering a more affordable alternative to models like GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5. Similarly, the V4 Pro model is priced at $0.145 per million input tokens and $3.48 per million output tokens, undercutting the costs associated with Gemini 3.1 Pro, GPT-5.5, Claude Opus 4.7, and GPT-5.4.

The release of these models coincides with recent allegations from the United States accusing China of industrial-scale intellectual property theft from American AI laboratories through numerous proxy accounts. DeepSeek has faced accusations from companies like Anthropic and OpenAI of distilling, a practice involving the replication of their AI models.

DeepSeek’s introduction of the V4 models signifies a substantial step forward in the AI landscape, offering high-performance capabilities at a reduced cost. While there remains a slight gap in knowledge-based tasks compared to the most advanced models, the V4 series demonstrates DeepSeek’s commitment to innovation and its potential to challenge existing AI paradigms.