DeepSeek V4 Just Landed — The Largest Open-Source AI Model Ever Built
A year after R1 shocked Silicon Valley, China's DeepSeek is back with something even bigger: 1.6 trillion parameters, a 1 million token context window, and pricing that undercuts every frontier model on Earth — by up to 7x.
(V4 Pro)
Window
Claude Opus
Verified Score
License
DeepSeek Strikes Again — And This Time It's Different
On April 24, 2026, Chinese AI startup DeepSeek quietly published two new models to Hugging Face under the MIT License and announced them on social media with a single sentence: "Welcome to the era of cost-effective 1M context length." Within hours, every major tech publication in the world was covering the story.
The models — DeepSeek V4 Pro and DeepSeek V4 Flash — are the company's first major release in four months, following a series of delays that kept the AI community guessing. The wait, it turns out, was for good reason. V4 Pro is the largest open-weight model ever released to the public, surpassing even Moonshot AI's Kimi K2.6 (1.1 trillion) and more than doubling DeepSeek's own V3.2 (685 billion parameters).
More importantly: it works. V4 Pro scores 80.6% on SWE-bench Verified — a coding benchmark where it comes within 0.2 percentage points of Anthropic's Claude Opus 4.6, which costs $25 per million output tokens compared to V4 Pro's $3.48. That is a 7x price gap at near-identical performance on one of the industry's most respected coding benchmarks.
Why This Matters: DeepSeek V4 was built and runs primarily on Huawei Ascend 950 chips — Chinese domestic silicon — not Nvidia hardware. This is the first frontier-class AI model to demonstrate that Washington's export control strategy may not be slowing China's AI ambitions as much as hoped. The implications go far beyond a benchmark table.
V4 Pro vs V4 Flash — Know the Difference
DeepSeek released two distinct models simultaneously, targeting different use cases and budgets. This is not a large/small variant split — it's a fundamental product segmentation between depth and speed.
Note: Both models are text-only at launch — no multimodal (image/audio/video) support yet. A "V4 Pro Max" variant with extended reasoning tokens is also available and shows superior performance on standard reasoning benchmarks, outperforming GPT-5.2 and Gemini 3.0 Pro.
Three Architecture Innovations That Make V4 Possible
DeepSeek didn't just scale up parameters — they rethought the architecture. Three key innovations published in peer-reviewed papers before today's launch explain how V4 achieves frontier-level performance without the compute costs of Western models.
Hybrid Attention Architecture (HAA)
Combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA) to replace standard full attention. This dramatically reduces the memory overhead of the Key-Value (KV) cache at scale, making the 1M token context window economically viable — not just a marketing claim. Previous models claiming million-token windows faced quadratic cost scaling that made them impractical in production.
Mixture-of-Experts (MoE) at Scale
Both V4 models use the MoE architecture — but at an unprecedented scale. V4 Pro has 1.6 trillion total parameters but only activates 49 billion per token. This means inference costs remain manageable even as the model's knowledge base grows to astronomical size. The Flash model applies the same principle: 284B total knowledge, only 13B activated per query. This is the key to beating Western pricing while matching their quality.
Engram Long-Context Memory
DeepSeek published research on "Engram" — a conditional memory system for long-context retrieval — in January 2026. Unlike traditional attention mechanisms that treat all tokens equally, Engram prioritizes relevant context and compresses distant information. This is why V4 can maintain coherence across a million-token context without the quality degradation seen in other models at extreme lengths. For developers, this means entire codebases become processable as a single prompt.
DeepSeek V4 vs The World's Best Models
DeepSeek published extensive benchmark data on release. Here's how V4 Pro (and its Max variant) stacks up against the top closed-source frontier models.
| Benchmark | DeepSeek V4 Pro | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro | Result |
|---|---|---|---|---|---|
| SWE-bench Verified | 80.6% | 80.8% | ~81% | ~79% | ▼ 0.2% gap |
| LiveCodeBench | 93.5% | 88.8% | ~90% | ~89% | V4 Wins |
| Terminal-Bench 2.0 | 67.9% | 65.4% | ~66% | ~64% | V4 Wins |
| HMMT 2026 (Math) | 95.2% | 96.2% | ~96% | ~97% | ▼ 1% gap |
| HLE (Hard Reasoning) | 37.7% | 40.0% | ~41% | ~42% | Behind ~3% |
| World Knowledge | Strong | Strong | Best | Best | 3–6 months behind |
| Output Price (per 1M tokens) | $3.48 | $25.00 | ~$15.00 | ~$10.00 | 7x Cheaper |
Bottom line on benchmarks: DeepSeek V4 Pro leads all open-source models across coding, math, and reasoning. Against closed-source frontier models, it wins on coding (LiveCodeBench, Terminal-Bench), ties on SWE-bench, and trails slightly on pure knowledge and hard reasoning — by a gap DeepSeek itself estimates as 3–6 months of development time. At 7x lower cost, the value proposition is undeniable for most real-world workloads.
DeepSeek V4 Pricing vs Every Major Model
DeepSeek V4 Flash is the cheapest small-frontier model available — undercutting even GPT-5.4 Nano. V4 Pro is the cheapest large-frontier model. The pricing gap is not marginal — it restructures the economics of building AI applications.
Real-world calculation: If you're running a coding agent that processes 100 million output tokens per month, you'd pay $2,500/month with Claude Opus vs $348/month with DeepSeek V4 Pro — at nearly identical SWE-bench performance. That $2,152/month saving is enough to hire an extra developer. For high-volume API workloads, the math is impossible to ignore.
Huawei Chips, US Export Controls & The AI Sovereignty Play
Buried inside the benchmark announcement is arguably the most consequential detail of the entire V4 release: this model was built and runs on domestic Chinese chips. DeepSeek partnered with Huawei, which confirmed Friday that its "Supernode" technology — combining large clusters of Ascend 950 AI processors — supports the full V4 model family.
Chinese chipmaker Cambricon also contributed hardware. Crucially, DeepSeek deliberately denied early optimization access to Nvidia and AMD while giving Chinese domestic chipmakers that window first. That's not a technical decision — it's a geopolitical statement.
⚠ Why This Changes the Export Control Calculus
The United States has spent two years restricting China's access to Nvidia's most advanced AI chips — the H100, H200, and A100 series. The theory: limit compute, limit AI capability. DeepSeek V4 challenges that theory directly.
If a 1.6 trillion parameter model can reach near-frontier performance on domestic Ascend chips, the export control strategy has a serious gap. Wei Sun, principal analyst at Counterpoint Research, put it plainly: "It allows AI systems to be built and deployed without relying solely on Nvidia — V4 could ultimately have an even bigger impact than R1, accelerating adoption domestically and contributing to faster global AI development overall."
Meanwhile, on the very day DeepSeek V4 launched, the White House published a memo from Science & Technology Director Michael Kratsios accusing foreign entities — primarily in China — of conducting "industrial-scale" campaigns to distill frontier AI models from US companies. While not naming DeepSeek directly, the timing was not coincidental. OpenAI and Anthropic have both formally accused DeepSeek of distillation — copying capabilities from their models into smaller, cheaper open-source versions.
"DeepSeek's V4 preview is a serious flex, offering lower inference costs than previous models."
— Neil Shah, VP of Research, Counterpoint Research (via CNBC)"Domestic competition has intensified significantly since R1's release. This framing of Chinese open-source vs Chinese open-source didn't exist with R1 — and that alone tells you how much the landscape has changed."
— Ivan Su, Senior Equity Analyst, Morningstar (via CNBC)How Did Markets Respond?
Unlike DeepSeek R1's January 2025 launch — which wiped hundreds of billions from US tech stocks overnight — V4's release was absorbed with relative calm. Analysts had already "priced in" the reality that Chinese AI is competitive and cheap. But the reaction in Chinese markets told a different story.
Chinese AI App Stocks Fall
AI application companies like Minimax (100 HK) fell 9.4% and Knowledge Atlas (2513 HK) dropped 9.1% — a sign that investors see V4 as a competitive threat within China's own market, potentially commoditizing the AI stack further.
Chinese Chipmakers Surge
HHS (1347 HK) spiked +15.2% and SMIC (981 HK) gained +10% on Friday — direct beneficiaries of DeepSeek's validation that domestic chips can power frontier AI. Huawei's Ascend ecosystem just received the biggest endorsement in its history.
Silicon Valley: Cautious Relief
Bloomberg's headline said it bluntly: "DeepSeek's long-awaited new model fails to narrow US lead in AI." V4 Pro trails GPT-5.4 and Gemini 3.1 Pro by 3–6 months in overall capability — enough breathing room that Washington and major US labs exhaled.
Developer Community: Ecstatic
On Hugging Face and X (Twitter), developers celebrated the release immediately. The Flash model (160GB) may run locally on high-end MacBooks with Apple Silicon. OpenRouter added both models within hours. The open-source AI community gained its most powerful foundation model ever.
The DeepSeek Story So Far — A Full Timeline
Who Should Switch to DeepSeek V4 Right Now?
Not every workload needs to switch. But for specific use cases, the value proposition is overwhelming. Here's a practical breakdown:
- Coding agent developers processing high token volumes — the 7x cost savings with comparable benchmark scores make V4 Pro the immediate default choice for SWE-bench class tasks
- Startups building AI applications who can't afford $25/M Claude pricing — V4 Flash at $0.28/M output is essentially free at most startup scales
- Researchers and academics who need to run entire codebases or long scientific documents through a model — 1M token context eliminates chunking entirely
- Open-source project contributors — MIT license means V4 can be fine-tuned, modified, and deployed commercially with no restrictions
- Companies with on-premise AI requirements — V4 weights are publicly downloadable; the Flash model (160GB) may run on enterprise Mac hardware
- Developers in API-cost-sensitive workflows — analytics pipelines, batch processing, document summarization at scale
Who should NOT switch yet: If you need multimodal capabilities (images, audio, video), require world-class general knowledge and hard reasoning (HLE), need Anthropic/OpenAI's safety infrastructure, or are building in regions where DeepSeek is restricted, stick with existing frontier models for now. V4 is genuinely behind in these areas.
Industry Reaction — Unfiltered
"Based on the benchmark results, it does appear DeepSeek V4 is going to be very competitive against its US rivals."
— Lian Jye Su, Chief Analyst, Omdia"This makes DeepSeek V4 the new largest open weights model — bigger than Kimi K2.6, GLM-5.1, and more than double DeepSeek V3.2. The pelicans are pretty good, but what's really notable is the cost."
— Simon Willison, prominent AI developer and blogger (simonwillison.net)"V4's benchmark profile suggests it could offer excellent agent capability at significantly lower cost. The ability to run natively on local chips could have massive implications for AI sovereignty."
— Wei Sun, Principal AI Analyst, Counterpoint Research"V4's debut is unlikely to have the same market impact as R1, because traders have already priced in the reality that Chinese AI is competitive and cheaper to use."
— Ivan Su, Senior Equity Analyst, MorningstarEverything You Need to Know About DeepSeek V4
What is DeepSeek V4 and when was it released?
DeepSeek V4 is a family of open-source large language models released by Chinese AI startup DeepSeek on April 24, 2026. It includes two variants: V4 Pro (1.6 trillion parameters) and V4 Flash (284 billion parameters). Both are available on Hugging Face under the MIT License and through the DeepSeek API.
How does DeepSeek V4 compare to GPT-5 and Claude?
DeepSeek V4 Pro leads all competitors on coding benchmarks (LiveCodeBench: 93.5%, Terminal-Bench: 67.9%) and comes within 0.2 percentage points of Claude Opus 4.6 on SWE-bench Verified (80.6% vs 80.8%). It trails GPT-5.4 and Gemini 3.1 Pro by approximately 3–6 months in general knowledge and hard reasoning, but at 7x lower cost per output token.
What is the price of DeepSeek V4 Pro?
DeepSeek V4 Pro costs $1.74 per million input tokens and $3.48 per million output tokens via the API. V4 Flash costs $0.14 per million input tokens and $0.28 per million output tokens — the cheapest pricing for a small frontier model, beating even GPT-5.4 Nano. Both models are also available as free open weights on Hugging Face.
What chips did DeepSeek use to train V4?
DeepSeek V4 was primarily built and runs on Huawei Ascend 950 chips and Cambricon hardware — Chinese domestic silicon. This is notable because the model's predecessor R1 was trained on Nvidia hardware. By achieving frontier-class performance on domestic chips, DeepSeek demonstrates that US export controls on advanced Nvidia chips have not stopped China's AI progress.
Is DeepSeek V4 truly open source?
Yes. Both V4 Pro and V4 Flash are released under the MIT License — the most permissive open-source license. This means developers can download the weights, run the models locally, fine-tune them for specific use cases, and deploy them commercially without restrictions or royalties. The Pro model weighs 865GB and the Flash model 160GB on Hugging Face.
What is the context window of DeepSeek V4?
Both V4 Pro and V4 Flash support a 1 million token context window — enough to process entire large codebases, book-length documents, or months of chat history in a single prompt. This is enabled by DeepSeek's proprietary Hybrid Attention Architecture and the Engram long-context memory system, which prevent the quadratic cost scaling that makes million-token windows impractical in other models.
Does DeepSeek V4 support images and video?
No. Both V4 Pro and V4 Flash are text-only models at launch. They do not support multimodal inputs (images, audio, or video) unlike many closed-source competitors such as GPT-5 and Gemini 3.1. This is one of the key areas where DeepSeek V4 trails frontier Western models.
What was the impact of DeepSeek V4 on stock markets?
Unlike R1's dramatic trillion-dollar selloff in January 2025, V4's market impact was more measured. Chinese AI application stocks fell (Minimax -9.4%, Knowledge Atlas -9.1%) while Chinese chipmakers surged (HHS +15.2%, SMIC +10%). US tech markets were largely unmoved, as investors had already priced in Chinese AI competitiveness since R1's disruption.
The Bottom Line on DeepSeek V4
DeepSeek V4 is not the "Sputnik moment" that R1 was. It doesn't shock the market. It doesn't reveal an unexpected leap that rewrites the rules overnight. But in some ways, that makes it more important — not less.
R1 proved Chinese AI could compete. V4 proves that competition is sustainable, systematic, and accelerating. A year after R1, DeepSeek has produced the world's largest open-source model, built it on domestic hardware, priced it at a fraction of Western alternatives, and matched frontier models on the benchmarks that matter most to developers.
The Stanford AI Index 2026 recently concluded that Chinese companies have "effectively closed" the AI performance gap with their US rivals. DeepSeek V4 is exhibit A. For the global AI ecosystem — for developers, for businesses, for policymakers — today's release changes the calculus in ways that won't be fully understood for months.
