Huawei will release breakthrough achievements in the field of AI inference: may reduce dependence on HBM and improve the performance of domestic large-scale models-Taibo

Huawei will release breakthrough achievements in the field of AI inference: may reduce dependence on HBM and improve the performance of domestic large-scale models

IT Home 11 Aug 2025 18:13

On August 10th, it was reported that Huawei will release breakthrough technological achievements in the field of AI reasoning at the 2025 Financial AI Reasoning Application Landing and Development Forum on August 12th. It is reported that this achievement may reduce the dependence of Chinese AI inference on HBM (high bandwidth memory) technology, improve the inference performance of domestic AI models, and perfect a key part of China's AI inference ecosystem.

It is understood that Huawei has already made technological breakthroughs in the field of AI inference. In March 2025, Peking University and Huawei jointly released the DeepSeek full stack open-source inference solution, which is based on Peking University's self-developed SCOW computing platform system and Hesi scheduling system. It integrates community open-source components such as DeepSeek, openEuler, MindSpore, and vLLM/RAY, achieving efficient inference of DeepSeek on Huawei Ascend.

In terms of performance, Huawei Ascend has achieved multiple breakthroughs. For example, when deploying DeepSeek V3/R1 on CloudMatrix 384 supernodes, the single card decode throughput exceeded 1920 Tokens/s under the 50ms latency constraint; The Atlas 800I A2 inference server achieved a single card throughput of 808 Tokens/s under a latency constraint of 100ms. The cooperation between iFlytek and Huawei has also achieved significant results. Both parties have taken the lead in realizing large-scale cross node expert parallel cluster inference of MoE models on domestic computing power, increasing inference throughput by 3.2 times and reducing end-to-end latency by 50%.