返回精选
AI 精选动态 智能评分 60

Sakana Fugu Technical Report

来源: twitter关注列表
作者: Rohan Paul (@rohanpaul_ai)
发布于: 2026-06-28
收录于: 2026-06-28
AI 推荐理由
Fugu 的轻量学习路由机制提供了一种不同于简单规则的多模型编排思路,值得关注其训练数据和具体实现细节。
核心解读
Sakana AI 发布 Fugu 技术报告,介绍一种多模型编排系统,包含 Regular(快速路由)和 Ultra(动态工作流)两个版本。核心创新在于通过轻量层学习用户请求,动态选择最适合的专家模型或构建协作流程,区别于简单的投票或固定规则方法。
全文
Sakana Fugu Technical Report The idea is that intelligence is moving from the model to the system around it. Fugu is an orchestrator reads the task, chooses which specialist model to use, and in the Ultra version can build small workflows where models critique, extend, or correct one another. Most multi-model systems use simple rules, like ask 3 models and vote, or always send coding to 1 model and math to another. Fugu is different because the manager is trained from data to learn which model is actually best for each kind of situation, including small details like “this looks like coding, but the hard part is debugging, so bring in the model that is better at debugging.” The mechanism has 2 versions. Regular Fugu is the fast version, where it reads the user’s request and quickly chooses 1 worker model from a pool, so the user experiences it like calling 1 model, but behind the scenes Fugu picked the model it thinks is best for that exact request. Fugu-Ultra is the slower but stronger version, where it can create a small workflow, such as asking 1 model to solve, another model to check, another model to solve from a different angle, and then choosing the best model to combine the answers. The special part is that the workflow is not fixed before the task starts, because Fugu-Ultra can design a different teamwork pattern for each question. ---- Link – arxiv. org/abs/2606.21228 ![photo](https://pbs.twimg.com/media/HL3sHuuaMAA0MDW.jpg) Rohan Paul (@rohanpaul_ai): The picture shows regular Fugu’s fast routing mechanism: it reads the user request, but does not answer it itself. The key part is the “lightweight head,” a small extra decision layer attached to the language model. That layer looks at the model’s hidden state, meaning its internal summary of what the request is about. Then it gives one score to each available worker model, and the highest score decides which outside LLM gets the task. The red diagonal marks a small tuning trick, where they adjust only a tiny part of the model’s internal weights so it gets better at choosing the right worker.
#技术报告#模型发布#AI