top of page

DeepSeek V4 Is Coming: 1 Trillion Parameters, Open Source, and Running on Huawei Chips

When DeepSeek V3 launched in December 2024, it erased roughly $589 billion from Nvidia's market cap in a single trading session, the largest single-day market cap loss in U.S. stock market history. Now deepseek v4 is arriving with a larger model, a lower estimated cost, and a hardware stack that bypasses Nvidia entirely. The number that will define this launch isn't one trillion parameters. It's the gap between $5.2 million and $1 billion.

According to reports, DeepSeek V4 carries approximately one trillion total parameters, a one-million-token context window, native multimodal capabilities across text, image, and video, and an Apache 2.0 open-source license, all reportedly trained on Huawei's Ascend 950PR chip rather than any Nvidia GPU. Every element of that sentence is geopolitically charged in 2026. Two storylines run underneath the technical specs: American AI companies have formally accused DeepSeek of using 24,000 fraudulent accounts to harvest their models' outputs at scale, and if V4 delivers frontier-level performance on domestic Chinese hardware, the central premise of U.S. export controls (that limiting GPU exports can slow Chinese AI) faces a public stress test it may not pass.

This piece breaks down what's reportedly in V4, why the cost equation keeps investors nervous, what to make of the distillation accusations, the technical architecture behind a trillion-parameter model that doesn't cost a trillion times more to run, how it stacks up against GPT-5.4, Claude Opus 4.6, and Llama 4, and what questions remain unanswered until independent benchmarks arrive.

A 1-Trillion-Parameter Model, Two Delays, and One Very Deliberate Hardware Choice

DeepSeek V4 first surfaced in Chinese tech media in January 2026, when outlets including Qbitai reported expectations of a Lunar New Year release. The February 17th window came and went without an announcement. In early March, TechNode reported that sources expected a release "this week"; that date also passed. Then on March 9th, an unannounced "V4 Lite" version with approximately 200 billion parameters briefly appeared on DeepSeek's platform, confirming the core architecture was real even if the full model wasn't ready.

The repeated slippage had a single engineering cause: migrating a model of this scale from Nvidia's CUDA software ecosystem to Huawei's CANN (Compute Architecture for Neural Networks) framework from scratch. By April 3rd, Reuters citing The Information reported that deepseek v4 was expected "within weeks" and would run on Huawei Ascend chips. DeepSeek founder Liang Wenfeng reportedly told internal contacts that a late-April release was the target.

The hardware decision is deliberate, not merely pragmatic. According to multiple reports, DeepSeek gave Huawei (not Nvidia) exclusive early hardware access during V4 development. Given that Nvidia's H100 and Blackwell series are both banned from export to China, the most advanced Nvidia chip legally available in China is the H20. Choosing to build around Huawei's Ascend 950PR instead sends a clear signal. The months of delay to complete the CUDA-to-CANN migration were, in effect, the price of a strategic independence statement.

That signal has already moved supply chains. Alibaba, ByteDance, and Tencent have reportedly pre-ordered hundreds of thousands of Ascend 950PR units, positioning their cloud platforms to sell inference services on the new model at scale the moment it ships. Pre-orders of that scale don't happen on speculation, they reflect confidence that the release is imminent and commercially viable.

$5.2 Million vs. $1 Billion: The Cost Equation That Keeps Silicon Valley Up at Night

DeepSeek V3's official technical report disclosed a training cost of approximately $5.576 million on H800 GPUs at $2 per GPU hour, roughly 5 to 6 percent of the estimated $100 million it cost to train GPT-4. The estimated training cost for deepseek v4 is reportedly around $5.2 million. That number is unverified by any primary source and should be treated as an approximation, but it is directionally consistent with V3's confirmed figure.

[Stanford FSI](https://cyber.fsi.stanford.edu/publication/taking-stock-deepseek-shock) framed the implication precisely: "DeepSeek's shock doesn't just mean a Chinese company outperforming American rivals; it challenges the assumption that advanced AI necessarily requires massive investment, and that assumption is the valuation basis of the current AI ecosystem." If V4 sustains that cost efficiency at one trillion parameters, the assumption doesn't just crack, it breaks.

The implications ripple outward. For enterprise buyers, the reported API pricing of approximately $0.30 per million input tokens and $0.50 per million output tokens would price deepseek v4 at roughly one-twentieth of GPT-5.4's cost. Three specific deployment scenarios crystallize what that price delta means in practice.

First: large-scale code review. A one-million-token context window can ingest an entire codebase (including one the size of the Linux kernel) in a single pass. No current open-source model operates at this context length at this parameter scale. Second: sovereign AI deployment. Government agencies and financial institutions in jurisdictions that cannot use U.S. cloud infrastructure can run V4 entirely on Huawei servers with no dependency on American providers, a capability with obvious appeal in regulated industries globally. Third: cost-driven scale. When inference pricing drops by a factor of twenty, applications that were economically marginal at GPT-5.4 pricing become viable at V4 pricing, which expands the addressable market for AI-native products substantially.

Jensen Huang has reportedly issued warnings about the combined competitive threat from DeepSeek and Huawei to Nvidia's market position. That warning echoes what Nvidia's stock price already said in January 2025: Nvidia fell approximately 17% in a single day when V3 launched, a loss of roughly $589 billion in market value. V4's release will test whether that reaction was a one-time shock or a recurring pattern.

24,000 Fake Accounts, 16 Million Conversations, and a Question No One Can Fully Answer

In February 2026, Anthropic's congressional filing claimed that DeepSeek, Moonshot AI, and MiniMax had collectively used approximately 24,000 fraudulent accounts to conduct more than 16 million interactions with Claude, specifically to harvest model outputs at scale. OpenAI submitted parallel documents alleging that DeepSeek had "continued to attempt to distill OpenAI and other leading U.S. frontier lab models through new obfuscation methods." Anthropic characterized the activity as a national security threat, warning that authoritarian governments could use frontier AI capabilities for offensive cyber operations and large-scale surveillance.

To understand what the accusation claims, it helps to understand what distillation actually means. Distillation (knowledge distillation) in AI refers to the practice of using a stronger model's outputs to train a weaker model, so the weaker model learns to approximate the stronger model's reasoning style and knowledge at a fraction of the original training cost. Large-scale distillation attacks (running millions of carefully designed queries through a competitor's API and using the responses as training data) can close capability gaps cheaply and quickly. The accusation isn't about copying code. It's about using one model's cognition to shape another model's development.

But the accusation deserves three counter-framings before it is taken as the explanation for V4's capabilities.

Counter-framing one: no one has demonstrated that distillation is the source of V4's performance. The allegations were leveled against activities associated with the V3 era. V4's actual training data composition is unknown. Even if the distillation attacks occurred exactly as described, the engineering challenge of training and organizing a one-trillion-parameter MoE model, migrating it entirely to a new hardware and software stack, and achieving stable inference at scale is not something distillation explains. Attributing DeepSeek's entire capability trajectory to stolen outputs requires accepting a conclusion that precedes the evidence.

Counter-framing two: the timing of the accusations deserves scrutiny. Both Anthropic and OpenAI submitted their filings to legislators simultaneously, at a moment when Congress was actively debating AI chip export legislation. American AI companies have a direct commercial interest in regulatory frameworks that disadvantage Chinese competitors. That doesn't make the underlying allegations false, it makes the policy-advocacy dimension of the filings worth naming explicitly alongside the security-concern framing.

Counter-framing three: the open-source strategy is structurally inconsistent with pure distillation dependence. If DeepSeek's capabilities derived primarily from harvested model outputs, the rational commercial move would be to keep the weights proprietary and extract maximum competitive advantage. Instead, DeepSeek has consistently released full weights under permissive licenses and published detailed technical reports. The V3 technical report's architectural innovations (Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, multi-token prediction objectives) have been independently reviewed and validated by researchers who found genuine engineering novelty. An organization that was simply distilling competitors' outputs would not invest in writing and publishing verifiable technical reports.

The underlying legal question is real and unresolved: in an era when querying a commercial API at scale and using the responses as training data is technically straightforward, where does the line between competitive research and intellectual property infringement fall? That legal framework does not yet exist. Treating the accusations as a settled verdict obscures a genuine ambiguity that the entire AI industry will eventually have to address.

Why a 1-Trillion-Parameter Model Doesn't Cost 1 Trillion Times More to Run

The number one trillion sounds computationally prohibitive until you understand Mixture-of-Experts (MoE) architecture, a design where the model is partitioned into specialized subnetworks, and only a small fraction of them activate for any given input. According to reports, the model carries approximately one trillion total parameters, but each token processed activates only around 37 billion of them, roughly 3.7% of the full model. The actual compute cost per token is therefore closer to running a 37-billion-parameter dense model, not a trillion-parameter one.

Comparing V3 and V4 directly illustrates the design philosophy. V3 has 671 billion total parameters with 37 billion active per token. V4 reportedly doubles the total knowledge capacity to approximately one trillion parameters while keeping the active parameter count at the same 37 billion. V4 has more stored knowledge but essentially equivalent inference cost per token. The efficiency gain comes from scaling the knowledge base, not widening the computation path.

The hardware that makes this possible (the Huawei Ascend 950PR) is more capable than its low profile outside China might suggest. According to TrendForce, its FP4 (4-bit floating point) computational throughput reaches 1.56 PFLOPS, approximately 2.8 times the H20's FP4 performance. It carries 112GB of HBM (High Bandwidth Memory), 1.16 times the H20's capacity, and a memory bandwidth of approximately 1.4 terabytes per second. Huawei claims multimodal inference throughput approximately 60% faster than the H20. The tradeoff is power draw: 600 watts versus the H20's roughly 400 watts.

What makes V4 historically significant from a software standpoint is that it is the first DeepSeek model with no Nvidia CUDA dependency anywhere in its stack. CANN (Compute Architecture for Neural Networks) is Huawei's proprietary software framework, analogous to CUDA in its role as the layer between AI software and hardware. The months of engineering work DeepSeek, Huawei, and Cambricon put into this migration produced something that didn't exist before: a fully functional, frontier-class Chinese AI technology stack, from chip to model, with no American software components. Whether or not V4's benchmarks match the reported numbers, that stack is real.

How DeepSeek V4 Stacks Up Against GPT-5.4, Claude Opus 4.6, and Llama 4

The competitive landscape for frontier AI models in April 2026 is tightly bunched at the top on coding benchmarks. Claude Opus 4.6 scores 80.8% on SWE-bench Verified, the standard coding evaluation. GPT-5.4 scores approximately 80.0%. Gemini 3.1 Pro sits at 80.6%. Deepseek v4 reportedly scores between 80 and 85% on the same benchmark, which would place it at or above the current frontier, but those numbers come from internal testing and have not been independently verified. Every V4 benchmark figure cited here is reportedly, pending external evaluation.

  • GPT-5.4: ~80.0% SWE-bench, multimodal, closed-source, high API pricing

  • Claude Opus 4.6: 80.8% SWE-bench, text-primary, closed-source, high API pricing

  • Gemini 3.1 Pro: 80.6% SWE-bench, text + image + audio + video, closed-source, mid API pricing

  • DeepSeek V4 (reportedly): ~80–85% SWE-bench, native multimodal (text/image/video), Apache 2.0, ~$0.30/M tokens input

The open source AI model competitive picture is where the comparison becomes most consequential. Among openly licensed models, the current field consists of Meta's Llama 4 (which offers a ten-million-token context window but carries commercial restrictions for deployments exceeding 700 million users) and Google's Gemma 4, which tops out at 31 billion parameters and targets efficiency over raw capability. DeepSeek's open source model history has consistently pushed the parameter ceiling: V3 at 671 billion parameters was already the largest fully open-weight model at launch. V4 at one trillion parameters would extend that lead substantially.

The Apache 2.0 license distinction matters for enterprise adoption in ways that are often underappreciated. Apache 2.0 includes explicit patent licensing protections that MIT does not, requires no derivative works to be open-sourced (unlike GPL), and carries no user-scale restrictions. An enterprise integrating a deepseek open source model under Apache 2.0 can deploy it to any number of users, incorporate it into commercial products, and modify it without obligations to share changes, a legal posture that Llama 4's license does not fully replicate beyond a 700-million-user threshold.

DeepSeek's trajectory from V2 through V4 follows a consistent pattern: each generation does more with comparable or lower training costs, while expanding the capability footprint. V2 introduced Multi-head Latent Attention. V3 validated the 37-billion active-parameter MoE efficiency model at scale. R1 demonstrated that reasoning-specialized fine-tuning on the same base architecture could top the App Store. V4 reportedly extends the total knowledge base by 50% over V3 while completing the Nvidia-to-Huawei migration. The consistency of the trajectory is arguably more significant than any single benchmark number.

Three Questions That Will Define What DeepSeek V4 Actually Means

The first and most immediate question: can independent benchmarks replicate the internal numbers? V3's technical report withstood peer review, its architectural claims were verified by external researchers, and its performance claims held up under third-party evaluation. V4 is operating at a larger scale on newer hardware with a new software stack. The Hugging Face community, MLPerf, and independent academic labs will begin running evaluations within days of the model's public release. Those results, not the internal figures, will determine whether V4 belongs at the frontier.

The second question concerns infrastructure: can Huawei's Ascend ecosystem handle the inference traffic that a model of this profile will attract? When V3 launched, DeepSeek's API experienced repeated instability under demand. V4 is larger, runs on newer hardware that has not been stress-tested at global scale, and will attract substantially more attention than V3 given the geopolitical context. The Chinese cloud providers (Alibaba, Tencent, ByteDance) who pre-ordered Ascend 950PR units will serve as the distributed inference backbone. Whether that architecture holds under peak load is an open question that will be answered in the first 72 hours after launch.

The third question is the most structurally significant: does V4's performance on domestic Chinese hardware force a revision of U.S. export control logic? The CSIS has analyzed this directly, noting that if DeepSeek V4 achieves frontier-level performance on the Ascend 950PR, it fundamentally challenges the premise that restricting GPU exports can effectively slow Chinese AI development. The EUISS has described DeepSeek's emergence as "the beginning of AI's multipolarization." If the chip restriction strategy has already been rendered ineffective, U.S. legislators face pressure to either escalate controls (targeting Huawei hardware and CANN software) or rethink the framework entirely. Neither path is straightforward.

For developers and enterprises watching this space, the practical guidance is clear: monitor DeepSeek's GitHub and Hugging Face pages for the official release. When the model drops, prioritize independent evaluation reports over official benchmarks, not out of distrust, but because independent validation is how frontier claims become confirmed facts. The Apache 2.0 license means you can begin planning integrations before the release without legal risk. The combination of a one-million-token context window, native multimodality, and no commercial usage restrictions will make certain workflow categories (long-document analysis, multimodal RAG, large codebase review) materially more accessible regardless of where the SWE-bench number ultimately lands.

If you're thinking about what models like deepseek v4 actually mean for how knowledge workers and developers organize AI-assisted workflows, the architecture question matters as much as the benchmark. A one-million-token context window and native multimodality change what's possible for anyone building tools that reason over large, heterogeneous information sets. For a practical look at how open source AI model capabilities translate into personal and team knowledge infrastructure, the guide to AI-native second brain systems covers how these architectural shifts (longer context, cheaper inference, open weights) are reshaping what a knowledge management layer can actually do. V3 started a conversation about whether frontier AI required billion-dollar budgets. V4 is about to extend that conversation to hardware independence. The answers will arrive in the benchmarks, not the press releases.

Get started for free

A local first AI Assistant w/ Personal Knowledge Management

For better AI experience,

remio only supports Windows 10+ (x64) and M-Chip Macs currently.

​Add Search Bar in Your Brain

Just Ask remio

Remember Everything

Organize Nothing

bottom of page