AI 精选动态智能评分 85

GLM-5.2 公开首个 1M-token 长上下文模型

来源: twitter关注列表

作者: jietang (@jietang)

发布于: 2026-06-16

收录于: 2026-06-16

AI 推荐理由

其 IndexShare 方案大幅降低 FLOPs 并首次在 1M 长度上实现高质量长时序推理，值得关注。

核心解读

离线 AI 研发团队推出 GLM-5.2，首个在 1M-token 上下文下稳定工作、具备长时序编码任务能力的开源模型。其新技术包括 IndexShare 通过在每四个稀疏注意力层共享同一索引器，使每 token FLOPs 在 1M 长度下降低 2.9 倍，以及改进的 MTP 线性层将拟议解码的接受长度提升 20%。GLM-5.2 在 FrontierSWE、PostTrainBench 与 SWE-Marathon 三大长时序编码基准上，分别以 1% 逼近 Opus‑4.8、在 PostTrainBench 位列第二并仅落后 Opus‑4.8；在 SWE‑Marathon 与 Opus‑4.8 差距 13%。

全文

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability. ![photo](https://pbs.twimg.com/media/HK-KABrbYAAuBDB.jpg) ![photo](https://pbs.twimg.com/media/HK-KHirbEAAzFaC.jpg)

#模型发布#技术突破#开源

阅读原始全文