DeepSeek V4 Arrives With Huawei Ascend Support: Why It Matters Beyond the Model
DeepSeek V4 is an open-weight million-token model adapted for Huawei Ascend chips, turning a model launch into a story about AI sovereignty, export controls, and non-NVIDIA acceleration.
DeepSeek V4 Arrives With Huawei Ascend Support: Why It Matters Beyond the Model
DeepSeek has released DeepSeek V4 with support for Huawei Ascend chips, making the launch about far more than model quality. The bigger story is China’s attempt to build a domestic AI compute stack that can reduce dependence on NVIDIA hardware, CUDA, and supply chains exposed to U.S. export controls.
The News in Brief
DeepSeek released the preview version of DeepSeek V4 on April 24, 2026, with two open-weight models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. According to CGTN, the release expands context length from 128K tokens to 1 million tokens and adds support for domestically developed chips, including Huawei Ascend processors.
Huawei said its Ascend SuperNode lineup supports the V4 series, including Ascend A2, A3, and 950 products for both V4-Flash and V4-Pro. TechRepublic reported that Huawei engineers described the full Ascend SuperNode product line as fully adapted for V4 inference workloads.
The core specifications are substantial. NVIDIA’s own technical blog lists V4-Pro at 1.6 trillion total parameters with 49 billion active parameters, and V4-Flash at 284 billion total parameters with 13 billion active parameters. Both support a 1 million-token context window and are MIT licensed.
What Was Actually Announced
There are two parts to the announcement: the model release and the hardware ecosystem alignment.
The model release is concrete. DeepSeek V4 ships as an open-weight family with a large Pro model and a smaller Flash model. V4-Pro is aimed at advanced reasoning, coding, long-context agents, and heavy enterprise workloads. V4-Flash is the efficiency option for faster, cheaper serving, summarisation, routing, chat, and lower-latency applications.
The headline feature is the 1 million-token context window. That puts V4 in the same long-context conversation as the most capable commercial models, but with open weights. For developers, the practical promise is the ability to build agents that can hold large codebases, documents, tool traces, logs, and retrieval results in memory for longer workflows.
The Huawei part is equally important but should be stated carefully. Huawei said its Ascend SuperNode products support DeepSeek V4, and reporting from CGTN and TechRepublic says the support covers Ascend A2, A3, and 950 products. That is strongest as an inference story: the model has been adapted so Chinese cloud providers and enterprises can serve V4 on domestic accelerators.
What is less clear is the full training story. Some commentary has described V4 as a model trained entirely on Huawei chips. The better-supported claim is narrower: V4 has first-class or day-zero support for Huawei Ascend inference, and Chinese chipmakers appear to have been given an optimisation path around the model. Training provenance, chip mix, cluster details, and exact hardware use have not been fully disclosed in a way that independent researchers can verify.
DeepSeek also appears to be keeping the API transition straightforward. The release has been framed as a model-family upgrade rather than a whole new product surface: developers can use V4 through DeepSeek’s API or download weights for self-hosting, subject to the practical demands of running very large MoE systems.
The Technical Angle
DeepSeek V4 is a mixture-of-experts model family. V4-Pro has 1.6 trillion total parameters but activates 49 billion per token. V4-Flash is smaller at 284 billion total parameters and 13 billion active parameters. That design keeps the apparent model scale high while reducing the compute used for each token compared with a dense model of similar total size.
The technical focus is long-context efficiency. A 1 million-token context window is only useful if the model can process long histories without memory and latency becoming unmanageable. The Hugging Face technical write-up says V4 reduces the KV cache problem through hybrid attention, including Compressed Sparse Attention and Heavily Compressed Attention. It reports that, at 1 million tokens, V4-Pro requires 27% of the single-token inference FLOPs of DeepSeek V3.2 and 10% of the KV cache memory. V4-Flash goes further, at 10% of the FLOPs and 7% of the KV cache.
This matters for agents. Long-running agents accumulate tool outputs, code diffs, retrieved documents, logs, browser traces, and intermediate reasoning. If the KV cache becomes too large, inference slows down or becomes uneconomic. V4’s architecture is designed to make those long traces cheaper to keep alive.
The Huawei adaptation adds another layer. NVIDIA’s advantage has never been only the chip. It has been CUDA, libraries, kernels, developer tools, serving stacks, and years of optimisation across the software ecosystem. Huawei’s equivalent stack includes Ascend hardware and the CANN software ecosystem. DeepSeek V4 running well on Ascend is therefore a software-and-systems milestone, not merely a chip compatibility note.
That said, Ascend compatibility does not mean Ascend automatically equals NVIDIA at scale. NVIDIA published day-one support for V4 on Blackwell and reported strong out-of-the-box performance on GB200 NVL72 systems. In other words, V4 is not a Huawei-only model. The strategic significance is that it can be made useful outside the NVIDIA/CUDA default path.
Why It Matters
DeepSeek V4 matters because it connects three races that are usually discussed separately: model capability, accelerator hardware, and geopolitical control over AI supply chains.
For China, Ascend support strengthens the case for AI sovereignty. If frontier-class open models can run efficiently on domestic chips, Chinese cloud providers, state-backed projects, and enterprises have a more credible path around U.S. export controls on advanced NVIDIA accelerators.
For Huawei, this is a developer ecosystem opportunity. A chip without compelling models and software support is not enough. DeepSeek gives Ascend a high-profile workload that developers actually want to run. That could help CANN, Ascend kernels, domestic cloud deployments, and Chinese AI infrastructure vendors mature faster.
For NVIDIA and U.S. policymakers, the message is uncomfortable. Export controls may slow access to the best chips, but they also create incentives for domestic substitutes. The question is not whether Ascend instantly beats Blackwell. It almost certainly does not. The question is whether the China stack becomes good enough for enough workloads that NVIDIA loses its inevitability in one of the world’s largest AI markets.
For enterprises outside China, V4 also broadens the infrastructure conversation. Open-weight models with efficient long-context inference can be deployed on different hardware paths. That weakens the assumption that serious AI always means one vendor, one software stack, and one procurement route.
The Reaction
The reaction has split along predictable lines.
Open-source AI developers focused first on the model: 1 million tokens, MIT-licensed weights, strong agent benchmarks, and a cheaper path to long-context workflows. The V4 family is being treated as one of the most important open model releases of 2026 because it makes long-context agents more practical outside closed APIs.
Infrastructure watchers focused on Huawei. TechRepublic framed the launch around Ascend support and China’s local GPU adoption. CGTN described the release as strengthening China’s independent AI computing ecosystem and challenging NVIDIA’s dominance.
NVIDIA’s reaction was also revealing. Rather than ignoring V4, NVIDIA published its own technical guide for running DeepSeek V4 on Blackwell, listing specifications, deployment routes, and performance notes. That is a reminder that open models can strengthen multiple hardware ecosystems at once.
The sceptical reaction is healthy. Some of the strongest claims around V4 and Ascend come from politically charged or vendor-adjacent sources. Developers will want independent throughput numbers, real cost per million tokens, stable serving recipes, and production references before accepting that Ascend is a drop-in alternative to NVIDIA for frontier-scale AI.
The Caveats and Open Questions
The first caveat is training transparency. DeepSeek has released weights and technical material, but there is still uncertainty around the exact training hardware, supply chain, cluster size, data mix, and optimisation process. That matters because the geopolitical interpretation changes depending on whether Ascend handled inference, partial training, or the full training run.
The second caveat is ecosystem maturity. Huawei Ascend hardware may support V4, but the surrounding software stack still has to prove itself under production pressure. CUDA’s moat is not just syntax. It is debugging tools, libraries, kernel maturity, distributed serving, community knowledge, model recipes, cloud availability, and staff familiarity. CANN and Ascend tooling need real developer adoption to close that gap.
The third caveat is scale. A model can run on a chip and still be hard to serve economically. Large MoE models need memory, networking, scheduling, routing, quantisation, and careful batching. Ascend SuperNode support is meaningful, but independent operators will need to see how V4 behaves at high concurrency, long context, and sustained production load.
There are also policy and safety questions. Open-weight frontier models are valuable for transparency and competition, but they can also be adapted for cyber, surveillance, influence, and military-adjacent workflows. When model capability becomes part of a national technology strategy, safety debates are harder to separate from industrial policy.
Finally, export controls remain a moving target. If V4 proves that China’s domestic stack is improving, U.S. policy may tighten further around chipmaking equipment, HBM, cloud access, model weights, or data-centre services. That could accelerate decoupling rather than resolve it.
What Comes Next
The next milestone is evidence. Watch for independent benchmarks of DeepSeek V4 on Ascend 950 systems versus NVIDIA H20, H200, GB200, and Blackwell Ultra platforms. The key numbers will be throughput, latency, cost per million tokens, long-context reliability, and stability under agent workloads.
The second milestone is deployment. If Chinese cloud providers make V4 on Ascend cheap and reliable, the domestic ecosystem gets a real flywheel: more users, more kernel work, more serving recipes, more CANN expertise, and more pressure on NVIDIA’s China position.
The broader trend is clear. AI competition is no longer just about who has the best model. It is about who controls the compute stack underneath it. DeepSeek V4 is important because it shows that the model layer and the sovereignty layer are now the same story.
Transformer AI helps SMEs navigate the AI landscape without the jargon. If you would like a frank conversation about what AI infrastructure shifts like DeepSeek V4 and Huawei Ascend could mean for your business, get in touch.
Gabriella Fernandez
Tags: