返回精选
AI 精选动态 智能评分 60

GLM-5.2 开源模型在代理基准上超越 GPT-5.5

来源: twitter关注列表
作者: AK (@_akhaliq)
发布于: 2026-06-22
收录于: 2026-06-22
AI 推荐理由
GLM-5.2 在代理任务上接近 GPT-5.5 水平,且完全开源,建议关注其实际部署与评测细节。
核心解读
Zhipu AI 的 GLM-5.2 模型在 GDPval-AA 基准上以 1524 Elo 排名第三,与 GPT-5.5 (xhigh) 的 1509 Elo 持平,并显著领先其他开源模型(如 MiniMax-M3 的 1408 Elo)及多个闭源模型。该模型采用 MIT 开源许可,在 Hugging Face 上免费提供。
全文
AK (@_akhaliq) 转发了 Niels Rogge (@NielsRogge) 的帖子: That's right, an open, MIT-licensed model beating GPT-5.5 (xhigh) on real-world agentic work! 🔥 Available for free on @huggingface for anyone to build on top off https://t.co/9TYnygzmf5 ![photo](https://pbs.twimg.com/media/HLcPvEoWsAAENqy.jpg) > **引用原帖 Artificial Analysis (@ArtificialAnlys):** > GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark > GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks. > Key takeaways: > ➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509) > ➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408 > ➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158) > ➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches > ➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase > https://x.com/ArtificialAnlys/status/2069121548670406947
#模型发布#开源#AI模型