AI 精选动态
智能评分 60
组合LLM收益上限分析
AI 推荐理由
本文提出了一个简单的beta指标来评估组合是否有效,并揭示了共失败聚类于答案格式的反直觉结论,值得实践者测量后再决定是否使用路由或MoA。核心解读
DAIR.AI转述一篇论文,分析了来自21家提供商的67个模型,证明任何路由、投票、级联或混合代理策略的准确率上限为1减去beta(所有候选模型都答错的查询比例)。论文指出,常用去相关假设不能保证改进空间,实际共失败高度集中在答案格式而非主题,建议在采用组合策略前先测量beta。
全文
elvis (@omarsar0) 转发了 DAIR.AI (@dair_ai) 的帖子:
When does combining LLMs help?
Great analysis on combining language models, measured across 67 models from 21 providers.
Any policy that routes, votes, cascades, or runs a mixture of agents and then returns one model's answer is bounded above by 1 minus beta, where beta is the fraction of queries every candidate model gets wrong.
The common justification for ensembling is diversity, usually measured as low pairwise error correlation. The paper proves that correlation cannot identify beta, so decorrelation does not establish that headroom exists. And across the 67 models, real co-failures are far more concentrated than independence-style assumptions predict.
Before assuming a router or MoA setup will help, measure beta. Co-failures cluster on the answer format rather than the subject.
Paper: https://t.co/PGO9YAoBzH
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
