返回精选
AI 精选动态 智能评分 60

DFlash:开源块扩散推理模型

来源: twitter关注列表
作者: Chubby♨️ (@kimmonismus)
发布于: 2026-06-23
收录于: 2026-06-25
AI 推荐理由
与逐 token 推测不同,DFlash 一次性生成整个块供并行验证,这一方法创新值得关注开源实现并尝试集成。
核心解读
NVIDIA AI 发布 DFlash,一个开源的轻量块扩散模型,用于投机解码,在 NVIDIA Blackwell 上推理吞吐量提升 15 倍,支持 SGLang、TensorRT-LLM 和 vLLM。
全文
Chubby♨️ (@kimmonismus) 转发了 NVIDIA AI (@NVIDIAAI) 的帖子: Increase inference performance by up to 15x without sacrificing responsiveness. DFlash, an open source lightweight block diffusion model designed for speculative decoding, delivers up to 15x higher throughput on NVIDIA Blackwell while maintaining the same user interactivity target. Instead of drafting tokens one at a time, it proposes a whole block in a single pass for the main model to verify in parallel. Adoption is drop-in with support in @lmsysorg SGLang, TensorRT-LLM, and @vllm_project. ![photo](https://pbs.twimg.com/media/HLgv6qXWAAANU5x.jpg)
#模型#开源#技术更新