AI 精选动态
智能评分 60
Fable 5发布Gemma 4 WebGPU优化demo
AI 推荐理由
提供了可验证的demo和kernels,展示了agentic kernel optimization的实际效果,值得关注端侧推理优化方法。核心解读
Xenova宣布发布Fable 5为Gemma 4编写的自定义WebGPU kernels demo,该demo在WebGPU上实现了255 tok/s的推理速度,较之前的84 tok/s有显著提升。该优化采用agentic kernel optimization,被认为是端侧推理的未来方向。
全文
AK (@_akhaliq) 转发了 Xenova (@xenovacom) 的帖子:
Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real.
Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser.
Agentic kernel optimization is the future of on-device inference https://t.co/HQYpM6aBLY
https://video.twimg.com/amplify_video/2067281202113863680/vid/avc1/2772x1774/Y-Z0x1cpoelUp_c_.mp4?tag=28
> **引用原帖 Xenova (@xenovacom):**
> I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference.
> It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible.
> Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.
> The next day, access to Fable 5 was suspended globally.
> https://x.com/xenovacom/status/2065656427117437213