AI 精选动态智能评分 60

Hao AI Lab 发布 JetSpec 投机解码技术

来源: twitter关注列表

作者: SemiAnalysis (@SemiAnalysis_)

发布于: 2026-06-30

收录于: 2026-06-30

AI 推荐理由

与普通投机解码相比，JetSpec 展示了因果并行树草稿的具体加速倍数和硬件实测 TPS，值得关注其与推理引擎的整合进展。

核心解读

Hao AI Lab 提出 JetSpec 技术，通过因果并行树草稿优化投机解码的草稿成本与质量，在 MATH-500 上实现最高 9.64 倍端到端加速，在开放聊天任务上实现 4.58 倍加速，且保持无损；结合 CUDA 图与内核优化，在单块 B200 上达到约 1000 TPS。

全文

Parallel draft tree, tree-causal verification Looking forward to its deeper integration with inference engines vLLM/SGLang! Great work @Lanxiang_Hu! > **引用原帖 Hao AI Lab (@haoailab):** > Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. > JetSpec reaches up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200. ⚡️ > Check out our project page for demos and a blog post on how we built it 👇 > https://t.co/M4T8jOBWQ8 > https://t.co/h9uipDbTuh > https://x.com/haoailab/status/2070225035403694408

#技术突破#模型#研究

阅读原始全文