返回精选
AI 精选动态 智能评分 72

GLM-5.2 反 hacking 模块分析

来源: twitter关注列表
作者: elvis (@omarsar0)
发布于: 2026-06-20
收录于: 2026-06-20
AI 推荐理由
值得阅读原文了解该模型训练方法创新,以及反 hacking 模块如何具体提升长时任务表现
核心解读
GLM-5.2 在设计任务上达到 Opus 级别表现,同时展现出色的长时间任务能力。据官方博客介绍,该模型在训练中加入了反 hacking 模块,以解决 RL 中的 reward hacking 问题,包括模型懒惰、意图不一致、冗长、追随等问题。相比标准 goal 方法中模型常采取的捷径导致 token 浪费和效果差,这种反 hacking 能力理论上可提升长时任务结果,属前沿开放权重模型中的罕见应用。
全文
GLM-5.2 is great at design (Opus level IMO). I am also starting to see great results with long-running tasks, too. How is this possible? I think there are a few clever hacks. But I just came across this from the official blog, and they actually trained this model with an anti-hacking module. RL, as many know, comes with this issue of reward hacking that often enables the model to take weird and suboptimal shortcuts. Not only that, but it makes the models sometimes feel like it's sometimes "lazy" or just plain "dumb" at times, including other issues like intent misalignment, verbosity, sycophancy, deception, etc. And you really don't want that for long-running tasks operated by coding agents. This is a great insight. If you use the standard /goal (in 5.5 or 4.8), you notice the models often take shortcuts that lead to long-running tasks (wasting tokens along the way) but with poor results. This is why I advocate for a focus on better verifiers. So this anti-hacking idea is a model capability that should, in theory, lead to better results on long-horizon tasks. I've seen efforts here and there in a few research papers, but haven't seen it translated to much, much less in a frontier, open-weight model. This might be contributing to some of the great results we are seeing with GLM-5.2, but I suspect there is more, of course, like better verification capabilities. It's not clear how all of these training signals lead to downstream capabilities, but this is something to look at closely with newer models. ![photo](https://pbs.twimg.com/media/HLRKn6NXIAApHm9.jpg)
#技术#分析#模型