返回精选
AI 精选动态 智能评分 60

GLM-5.2在代码竞赛中表现优异

来源: twitter关注列表
作者: Z.ai (@Zai_org)
发布于: 2026-06-16
收录于: 2026-06-16
AI 推荐理由
开源模型在主流竞赛平台上的具体排名和性能数据,展现出与闭源模型的竞争力,值得开发者关注
核心解读
Z.ai的GLM-5.2(Max)在Arena.ai的代码竞赛前端项目中排名第2,超越Claude Opus 4.7(Thinking)29分,落后Fable 5;在Agent Arena中排名第10,是开源模型中表现最佳。相比GLM-5.1,模型从第13升至第10,任务成功率和用户满意度提升,但指导性下降6.0%。定价1.4/4.4美元/MTokens,支持100万token上下文,覆盖品牌营销、数据分析等多个子类别。
全文
Z.ai (@Zai_org) 转发了 Arena.ai (@arena) 的帖子: Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations. Congrats @Zai_org for the incredible milestone! ![photo](https://pbs.twimg.com/media/HK9O88SbwAAWoEy.jpg) > **引用原帖 Arena.ai (@arena):** > GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin! > In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology. > Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%). > GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window. > Huge congrats @Zai_org for the incredible release! > See thread for details on how GLM-5.2 (Max) performs across 5 different signals. > https://x.com/arena/status/2066943450914943025
#模型发布#性能评测#大模型