返回精选
AI 精选动态 智能评分 65

GLM-5.2在CritPt基准上匹配Claude Opus 4.8

来源: twitter关注列表
作者: Artificial Analysis (@ArtificialAnlys)
发布于: 2026-06-17
收录于: 2026-06-17
AI 推荐理由
GLM-5.2在硬核物理基准上追平顶尖专有模型,较前代跃升4.5倍,是开放权重模型科学推理能力的重要里程碑,值得关注原文及后续分析。
核心解读
Z ai的GLM-5.2(最大推理努力)在CritPt基准上得分为20.9%,与Claude Opus 4.8持平,远超其他开放权重模型(DeepSeek V4 Pro为12.9%),并超越GPT-5.5、Gemini 3.1 Pro等专有模型。相比10周前GLM-5.1的4.6%,实现4.5倍跃升。CritPt由Argonne和UIUC联合开发,答案保密,由Artificial Analysis独立评测。
全文
A standout number in Z ai’s GLM-5.2 launch is CritPt, a benchmark of unpublished research-level physics problems where it ties with Claude Opus 4.8 and is well above other open weights models Key takeaways: ➤ @Zai_org ’s GLM-5.2 (max reasoning effort) leads open weights by a wide margin: the next open model, DeepSeek V4 Pro, scores 12.9% ➤ GLM-5.2 matches Claude Opus 4.8 (20.9%) and beats several proprietary models, including GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 ➤ Only proprietary models score higher with GPT-5.5 Pro topping the benchmark at 30.6% ➤ A 4.5× generational jump: GLM-5.1 scored just 4.6% on CritPt ten weeks ago ![photo](https://pbs.twimg.com/media/HLChCxQbcAAW7I_.jpg) Artificial Analysis (@ArtificialAnlys): Context on the result: CritPt is hard. It focuses on frontier physics problems developed by Argonne and UIUC through contributions from 60+ researchers globally, with the answer key and grading kept private. Models are independently benchmarked by Artificial Analysis. Even the highest-scoring model, GPT-5.5 Pro, solves under a third of the problems. For an open weights model to approach leading proprietary models is a real marker of progress for open models on scientific reasoning. https://t.co/Y55fgUEoaJ
#模型发布#研究突破