AI 精选动态智能评分 65

GLM-5.2在CritPt基准上匹配Claude Opus 4.8

来源: twitter关注列表

作者: Artificial Analysis (@ArtificialAnlys)

发布于: 2026-06-17

收录于: 2026-06-17

AI 推荐理由

GLM-5.2在硬核物理基准上追平顶尖专有模型，较前代跃升4.5倍，是开放权重模型科学推理能力的重要里程碑，值得关注原文及后续分析。

核心解读

Z ai的GLM-5.2（最大推理努力）在CritPt基准上得分为20.9%，与Claude Opus 4.8持平，远超其他开放权重模型（DeepSeek V4 Pro为12.9%），并超越GPT-5.5、Gemini 3.1 Pro等专有模型。相比10周前GLM-5.1的4.6%，实现4.5倍跃升。CritPt由Argonne和UIUC联合开发，答案保密，由Artificial Analysis独立评测。

全文

A standout number in Z ai’s GLM-5.2 launch is CritPt, a benchmark of unpublished research-level physics problems where it ties with Claude Opus 4.8 and is well above other open weights models Key takeaways: ➤ @Zai_org ’s GLM-5.2 (max reasoning effort) leads open weights by a wide margin: the next open model, DeepSeek V4 Pro, scores 12.9% ➤ GLM-5.2 matches Claude Opus 4.8 (20.9%) and beats several proprietary models, including GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 ➤ Only proprietary models score higher with GPT-5.5 Pro topping the benchmark at 30.6% ➤ A 4.5× generational jump: GLM-5.1 scored just 4.6% on CritPt ten weeks ago ![photo](https://pbs.twimg.com/media/HLChCxQbcAAW7I_.jpg) Artificial Analysis (@ArtificialAnlys): Context on the result: CritPt is hard. It focuses on frontier physics problems developed by Argonne and UIUC through contributions from 60+ researchers globally, with the answer key and grading kept private. Models are independently benchmarked by Artificial Analysis. Even the highest-scoring model, GPT-5.5 Pro, solves under a third of the problems. For an open weights model to approach leading proprietary models is a real marker of progress for open models on scientific reasoning. https://t.co/Y55fgUEoaJ

#模型发布#研究突破

阅读原始全文