AI 精选动态智能评分 60

GLM-5.2 在 GDPval-AA 排名第三

来源: twitter关注列表

作者: Chubby♨️ (@kimmonismus)

发布于: 2026-06-22

收录于: 2026-06-22

AI 推荐理由

GLM-5.2 在智能体任务上达到与 GPT-5.5 相当的水平，表明开源模型追赶速度显著加快，值得关注其开源权重和实际应用表现。

核心解读

Artificial Analysis 评测显示，GLM-5.2 在真实世界智能体工作基准 GDPval-AA 上获得 1524 Elo 排名第三，仅次于 Claude Fable 5（1783）和 Claude Opus 4.8（1615），与 GPT-5.5 (xhigh)（1509）相当，并大幅领先其他开源模型。

全文

Absolutely incredible: GLM-5.2 (max) sits at #3 overall on GDPval-AA, a real-world agentic work benchmark, even ahead of GPT-5.5 (xhigh). Oh and btw: looks like open source is no longer 7 months behind. GDPval-AA, a benchmark built around real professional and creative tasks. The models had to produce practical deliverables from identical briefs, including a retail supervisor’s task list, an emergency-stop circuit schematic, and a music video moodboard. Thats why we'll probably see a big leap with GPT-5.6. Even open source competition is catching up insanley fast. ![photo](https://pbs.twimg.com/media/HLcXrS8WEAESsAO.jpg) > **引用原帖 Artificial Analysis (@ArtificialAnlys):** > GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark > GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks. > Key takeaways: > ➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509) > ➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408 > ➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158) > ➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches > ➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase > https://x.com/ArtificialAnlys/status/2069121548670406947

#模型#技术突破#基准测试

阅读原始全文