AI 精选动态智能评分 65

OpenRouter continuously runs GPQA and TAU-Bench on most open-weight models

来源: twitter关注列表

作者: OpenRouter (@OpenRouter)

发布于: 2026-06-28

收录于: 2026-06-28

AI 推荐理由

AutoExacto 是 OpenRouter 内部的模型评测框架，提供自动化基准测试并降低开发者对开源模型的信任成本。

核心解读

OpenRouter 与 Parasail.io 和 Zai.org 合作，使用 AutoExacto meta-benchmark 对开源模型进行自动化评测并公开结果，该工具被默认用于路由模型调用，AutoExacto 基于 GPQA 和 TAU-Bench，模型排行榜显示 Parasail.io 和 Zai.org 排名靠前。

全文

Tip: OpenRouter continuously runs GPQA and TAU-Bench on most open-weight models and publishes the results publicly. This informs our AutoExacto meta-benchmark, used by default when routing tool calls. Here, @Parasail_io and @Zai_org rank first: https://openrouter.ai/z-ai/glm-5.2#performance https://t.co/0dsUuR5Tsq ![photo](https://pbs.twimg.com/media/HL6sU60WgAAwOjZ.jpg) OpenRouter (@OpenRouter): More about AutoExacto: https://t.co/x9tJDRDBst

#排行榜#AutoExacto#AutoExacto#技术更新

阅读原始全文