AI 精选动态
智能评分 60
GLM-5.2移动开发能力大幅提升
AI 推荐理由
该数据首次量化了GLM-5.2在复杂移动开发任务上的能力提升,值得关注。核心解读
智谱GLM-5.2在移动开发基准测试中任务完成数从21/70提升至48/70,接近Claude Fable 5的56/70,实现两倍以上提升。
全文
Z.ai (@Zai_org) 转发了 Cunxiang Wang (@CunxiangWang) 的帖子:
GLM-5.2 is not only stronger on benchmarks, but also much better in real app development scenarios — iOS, Android, WeChat Mini Programs, and more.
Behind this jump is a full loop from environment construction, evaluation, data optimization, reward design, to training.
Real tasks, real execution, real improvement.
> **引用原帖 Zixuan Li (@ZixuanLi_):**
> GLM-5.2 delivers a substantial leap in app development capabilities, which also represent demanding long-horizon tasks.
> Results:
> - GLM-5.1: 21/70
> - GLM-5.2: 48/70
> - Claude Fable 5: 56/70
> That's more than a twofold improvement from GLM-5.1 to GLM-5.2.
> These come from an internal benchmark of 35 challenging mobile development tasks, each run twice for a total of 70 trials. We measured task completion, defined as core features working without major issues.
> https://x.com/ZixuanLi_/status/2067803136283005393