AI 精选动态
智能评分 60
DeepReinforce 发布 Ornith-1.0 开源编码模型系列
AI 推荐理由
Ornith-1.0 的自改进训练策略通过联合优化 scaffold 和解决方案,与现有方法形成差异,值得关注其实际效果。核心解读
DeepReinforce 发布了 Ornith-1.0 开源模型系列,专门用于 agentic coding,参数覆盖 9B Dense、31B Dense、35B MoE 和 397B MoE。397B MoE 旗舰在编码基准上匹配 Claude Opus 4.7,9B Dense 变体优化用于边缘设备。模型基于 gemma4 和 qwen3.5 后训练,采用自改进训练策略,通过强化学习联合优化任务 scaffold 和解决方案,并在 Terminal-Bench 2.1、SWE-Bench 等基准上取得 SOTA 性能。所有模型采用 MIT 许可发布。
全文
DeepReinforce has released Ornith-1.0, their new self-improving family of open-source models designed for agentic coding.
> Ornith-1.0 learns to write its own task scaffolds during training rather than relying on human-designed harnesses.
> The 397B MoE flagship can match Claude Opus 4.7 on coding benchmarks, and the compact 9B Dense variant is optimized for edge devices.

> **引用原帖 Ornith (@ornith_):**
> Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding.
> Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including:
> ✅Terminal-Bench 2.1(77.5)
> ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual)
> ✅NL2Repo(48.2)
> ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW)
> ✅ClawEval(77.1)
> Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎
> All models are released under the MIT license, enabling full commercial and research use.
> 📖Tech Blog: https://t.co/qT9N2HYWFn
> 🤗Huggingface: https://t.co/PRrwqjeBtM
> https://x.com/ornith_/status/2070148887067963854
🚨 AI News | TestingCatalog (@testingcatalog): Each training step has the model propose a refined scaffold for a task, which it then uses to generate a solution, with reward flowing back to both stages. There are three layers of guard against reward hacking.
As DeepReinforce states, the 9B variant achieves a score of 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified, matching much heavier models like Gemma 4-31B.
Pull the weights on HF for testing 👀
https://t.co/gYITOBUEmU
🚨 AI News | TestingCatalog (@testingcatalog): Documented 🗞️
https://t.co/BHnhcaKoek