AI 精选动态智能评分 65

JetSpec: 推测解码实现 LLM 生成极致加速

来源: twitter关注列表

作者: Hao AI Lab (@haoailab)

发布于: 2026-06-26

收录于: 2026-06-26

AI 推荐理由

相比标准推测解码，JetSpec 通过因果并行树草稿联合优化草稿成本和质量，实现显著提升，值得关注其开源实现和性能复现。

核心解读

Hao AI Lab 发布 JetSpec，通过因果并行树草稿实现推测解码，联合优化草稿成本和质量。在 MATH-500 上达到 9.64x 端到端加速，开放聊天 4.58x，保持无损。结合 CUDA graph 和 kernel 优化，单 B200 可达约 1000 TPS。

全文

Hao AI Lab (@haoailab) 转发了 Kaichao You (@KaichaoYou) 的帖子： wow glad to see the vLLM support with JetSpec! > **引用原帖 Hao AI Lab (@haoailab):** > Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. > JetSpec reaches up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200. ⚡️ > Check out our project page for demos and a blog post on how we built it 👇 > https://t.co/M4T8jOBWQ8 > https://t.co/h9uipDbTuh > https://x.com/haoailab/status/2070225035403694408

#技术突破#模型#AI

阅读原始全文