AI 精选动态智能评分 65

Hao AI Lab 推出 JetSpec 实现最高 9.64 倍推理加速

来源: twitter关注列表

作者: Hao AI Lab (@haoailab)

发布于: 2026-06-26

收录于: 2026-06-26

AI 推荐理由

无需复现，值得关注其开源代码，因果并行树草稿的优化思路对 LLM 推理延迟改进有参考价值。

核心解读

Hao AI Lab 推出 JetSpec，一种轻量级、保持因果性的草稿头，通过因果并行树草稿实现推测解码，在 MATH-500 上达到 9.64 倍端到端加速，在开放式聊天上达到 4.58 倍加速，且保持无损。结合 CUDA 图和内核优化，在单块 B200 上可达约 1000 TPS。论文、代码、检查点和 vLLM 引擎均已开源。

全文

Hao AI Lab (@haoailab) 转发了 Yu-Yang Qian (@YuYangQian_ai) 的帖子： Very excited to be part of this work with @Lanxiang_Hu and @haoailab ! 🚀 Our JetSpec adds a lightweight, causality-preserving draft head, achieving up to 9.6× end-to-end speedup, fully plug-and-play. Paper, code, checkpoints, and vLLM engine are all open-source. Give it a try! > **引用原帖 Hao AI Lab (@haoailab):** > Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. > JetSpec reaches up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200. ⚡️ > Check out our project page for demos and a blog post on how we built it 👇 > https://t.co/M4T8jOBWQ8 > https://t.co/h9uipDbTuh > https://x.com/haoailab/status/2070225035403694408

#AI#技术#开源

阅读原始全文