AI 精选动态智能评分 72

灵码GLM-5.2在Agent Arena排行榜升至第10名，与Claude-Opus-4.8性能接近

来源: twitter关注列表

作者: Han Xiao (@hxiao)

发布于: 2026-06-16

收录于: 2026-06-16

核心解读

灵码(GLM-5.2)由Zai_org开发，在Agent Arena基准测试中从第13名升至第10名，在任务成功率和用户口碑方面优势显著，但可控性有-6%下降；基准测试包含现实世界多步骤任务，与Claude-Opus-4.8（非思考模式）性能相近，价格、上下文窗口（1M）维持不变。

全文

Heard a lot of positive feedback about GLM5.2! Big congrats to @Zai_org and @jietang Clearly they have figured out sth other open weight models haven’t. May the scale continue, keep pushing the edge and make frontier tokens accessible to everyone! > **引用原帖 Arena.ai (@arena):** > GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin! > In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology. > Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%). > GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window. > Huge congrats @Zai_org for the incredible release! > See thread for details on how GLM-5.2 (Max) performs across 5 different signals. > https://x.com/arena/status/2066943450914943025

#产品更新#开源#行业动态

阅读原始全文