返回精选
AI 精选动态 智能评分 75

JetSpec 加速

来源: twitter关注列表
作者: Hao AI Lab (@haoailab)
发布于: 2026-06-26
收录于: 2026-06-28
AI 推荐理由
新增了基于 CUDA graph 的并行树草拟 decoding 方法,可在保持 lossless 的前提下显著降低推理延迟。
核心解读
Hao AI Lab 介绍 JetSpec,实现相较于以往 speculative decoding 与 block diffusion 的 9.64 倍 MATH-500 与 4.58 倍开放式聊天速度提升,单卡 B200 达 1000 TPS,保持 lossless。
全文
Hao AI Lab (@haoailab) 转发了 Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex) 的帖子: I want to bring your attention to JetSpec because it looks strictly smarter and stronger than previous speculative decoding and block diffusion approaches (yes, again). Avg 1000 t/s single stream with Qwen-8B on B200. Basically, you can better utilize compute at any batch size. https://t.co/OFK1dY8kmX ![photo](https://pbs.twimg.com/media/HLtwNFQWwAEv5Im.jpg) > **引用原帖 Hao AI Lab (@haoailab):** > Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. > JetSpec reaches up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200. ⚡️ > Check out our project page for demos and a blog post on how we built it 👇 > https://t.co/M4T8jOBWQ8 > https://t.co/h9uipDbTuh > https://x.com/haoailab/status/2070225035403694408
#技术突破#模型发布