返回精选
AI 精选动态 智能评分 60

DeepReinforce 发布 Ornith-1.0 开源编码模型系列

来源: twitter关注列表
作者: 🚨 AI News | TestingCatalog (@testingcatalog)
发布于: 2026-06-25
收录于: 2026-06-25
AI 推荐理由
Ornith-1.0 的自改进训练策略通过联合优化 scaffold 和解决方案,与现有方法形成差异,值得关注其实际效果。
核心解读
DeepReinforce 发布了 Ornith-1.0 开源模型系列,专门用于 agentic coding,参数覆盖 9B Dense、31B Dense、35B MoE 和 397B MoE。397B MoE 旗舰在编码基准上匹配 Claude Opus 4.7,9B Dense 变体优化用于边缘设备。模型基于 gemma4 和 qwen3.5 后训练,采用自改进训练策略,通过强化学习联合优化任务 scaffold 和解决方案,并在 Terminal-Bench 2.1、SWE-Bench 等基准上取得 SOTA 性能。所有模型采用 MIT 许可发布。
全文
DeepReinforce has released Ornith-1.0, their new self-improving family of open-source models designed for agentic coding. > Ornith-1.0 learns to write its own task scaffolds during training rather than relying on human-designed harnesses. > The 397B MoE flagship can match Claude Opus 4.7 on coding benchmarks, and the compact 9B Dense variant is optimized for edge devices. ![photo](https://pbs.twimg.com/media/HLqopb9WwAA1XqQ.jpg) > **引用原帖 Ornith (@ornith_):** > Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. > Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: > ✅Terminal-Bench 2.1(77.5) > ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) > ✅NL2Repo(48.2) > ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) > ✅ClawEval(77.1) > Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 > All models are released under the MIT license, enabling full commercial and research use. > 📖Tech Blog: https://t.co/qT9N2HYWFn > 🤗Huggingface: https://t.co/PRrwqjeBtM > https://x.com/ornith_/status/2070148887067963854 🚨 AI News | TestingCatalog (@testingcatalog): Each training step has the model propose a refined scaffold for a task, which it then uses to generate a solution, with reward flowing back to both stages. There are three layers of guard against reward hacking. As DeepReinforce states, the 9B variant achieves a score of 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified, matching much heavier models like Gemma 4-31B. Pull the weights on HF for testing 👀 https://t.co/gYITOBUEmU 🚨 AI News | TestingCatalog (@testingcatalog): Documented 🗞️ https://t.co/BHnhcaKoek
#模型发布#开源#大模型