返回精选
AI 精选动态 智能评分 62

GB200 NVL72推理成本降低2.5倍

来源: twitter关注列表
作者: SemiAnalysis (@SemiAnalysis_)
发布于: 2026-06-22
收录于: 2026-06-22
AI 推荐理由
该优化仅通过软件实现,对MoE模型推理成本下降有启发性,值得研究技术细节。
核心解读
NVIDIA团队在70天内通过软件优化(重写NVFP4 MoE内核)将GB200 NVL72的推理服务成本降低2.5倍,利用铜背板带宽优势,该架构与xAI的Cursor Composer 2.5相同。
全文
CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the same model architecture as xAI’s popular Cursor Composer 2.5. One of the key software optimizations was rewriting the NVFP4 MoE kernel using CuTe-DSL, which is additive to the existing wide-expert parallelism optimization. This takes advantage of NVL72’s copper backplane, which has 18x higher bandwidth than standard RoCEv2/InfiniBand. Great work by Xin Li, Jun Yang, & the NVIDIA team on decreasing serving costs by 2.5x in less than 70 days! 🔥 ![photo](https://pbs.twimg.com/media/HLbuimdWwAE6RCg.jpg)
#AI#技术#技术突破