AI 精选动态
智能评分 60
NVIDIA推理软件优化降低token成本
AI 推荐理由
差异点:展示了 Blackwell 推理软件在部署后持续优化带来的量化成本下降,值得关注其实际效果。核心解读
NVIDIA 公布其推理软件栈在 Blackwell 上一个月内将 DeepSeek V4 性能提升高达 5 倍,token 成本降至之前的约五分之一,整体吞吐量提升 20 倍,且持续优化降低成本。
全文
NVIDIA inference software keeps driving down token costs, long after AI infrastructure is deployed. ⚡
In just one month on NVIDIA Blackwell, software optimizations improved DeepSeek V4 performance by up to 5×, reducing token costs to roughly one-fifth of previous levels. NVIDIA's integrated inference software stack compounds improvements across runtimes, kernels, networking, and hardware, delivering up to 20× higher throughput on the same GPU.
Co-designed with NVIDIA GPUs, CPUs, networking, and systems, and powered by CUDA-native open source frameworks, NVIDIA's inference software stack ensures new model breakthroughs and optimizations run on NVIDIA from day zero, and keep improving throughput and lowering cost after deployment.
See how @Baseten, @Cognition, @DeepInfra, @togethercompute, and @Cursor_ai are turning continuous software innovation into lower cost per token: https://t.co/jQquhFMR2m
