AI 精选动态
智能评分 70
Ornith-1.0 开源代理编码模型发布
AI 推荐理由
值得关注其联合优化 scaffold 的训练策略,这是区别于其他编码 Agent 模型的关键创新。核心解读
DeepReinforce 发布 Ornith-1.0 开源模型家族,旗舰版 397B MoE(17B 活跃)在 SWE-Bench Verified 上达 82.4,Terminal-Bench 2.1 上达 77.5,均超越 Claude Opus 4.7。该模型基于 Gemma 4 和 Qwen 3.5 预训练,采用自改进训练策略,联合优化模型输出和任务脚手架,9B 版本亦达 69.4。
全文
Another fantastic open source release.
DeepReinforce just dropped Ornith-1.0, an MIT-licensed open-source family of agentic coding LLMs.
The flagship Ornith-1.0-397B MoE (17B-active) is the most powerful model in the release, reporting 82.4 on SWE-Bench Verified and 77.5 on Terminal-Bench 2.1 - surpassing Claude Opus 4.7 on both benchmarks.
Built on top of pretrained Gemma 4 and Qwen 3.5
Employs a novel self-improving training strategy. With this Ornith changes the training target by asking the model to improve both the answer and the task scaffold, meaning the plan, memory pattern, tool rhythm, error handling, and search process that shape the answer.
During RL, the model proposes a better scaffold first, then uses it to produce solution rollouts, and the reward updates both stages together.
That makes the model less like a coder following one rigid checklist and more like a coder learning which checklist works for each type of bug, repo, or terminal task.
The most interesting result is the 9B model reaching 69.4 on SWE-Bench Verified

> **引用原帖 Ornith (@ornith_):**
> Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding.
> Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including:
> ✅Terminal-Bench 2.1(77.5)
> ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual)
> ✅NL2Repo(48.2)
> ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW)
> ✅ClawEval(77.1)
> Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎
> All models are released under the MIT license, enabling full commercial and research use.
> 📖Tech Blog: https://t.co/qT9N2HYWFn
> 🤗Huggingface: https://t.co/PRrwqjeBtM
> https://x.com/ornith_/status/2070148887067963854
Rohan Paul (@rohanpaul_ai): Ornith is trained to improve the task scaffold and the code solution together, so it learns the workflow that produces better answers, not just the answer itself.
The reward signal goes back into both parts, so over many RL steps it can learn better planning, retry logic, tool use, and error handling for coding-agent tasks.
📖Tech Blog: https://t.co/nnvPfAn5Ac
🤗Huggingface: https://t.co/OvFQK4aTVN