AI 精选动态智能评分 62

DeepSeek研究员开源AutoResearch协议并发布Self-play综述

来源: twitter关注列表

作者: AYi (@AYi_AInotes)

发布于: 2026-06-19

收录于: 2026-06-19

AI 推荐理由

该内容提供了具体的自主RL研究循环实现细节和5个工程思路，值得从业者点开原文获取可复用的方法论。

核心解读

DeepSeek研究员Deli Chen开源了AutoResearch协议并发布Self-play综述。其代理在DeepSeek 285B模型上首次完全自主运行了完整的RL研究闭环（实验设计、代码编写、GPU任务提交、调试、结论总结），实现零人工干预。

全文

这可能是我近期看到的最值得深入研究的一次skills开源和工程脚手架，最后总结的5个工程思路大家可以直接拿去用。 DeepSeek 研究员 Deli Chen 把他的 AutoResearch 协议开源了，同时扔出一篇关于 Self-play 的综述（第四篇）。最炸的地方是，他的代理第一次完全 autonomously 在 285B 模型上跑通了完整的 RL 研究闭环——实验设计、写代码、提交 GPU 任务、debug、到出结论，全程零人工干预。要知道写代码和跑通研究闭环是两件事，就像学会炒菜和开一家每天出品稳定的餐厅，差的不只是一道菜，还有整套后厨流程。至于论文里的结论，我放在评论区。 ![photo](https://pbs.twimg.com/media/HLJevC0XwAArF-x.jpg) > **引用原帖 Deli Chen (@victor207755822):** > 🧵 Deli AutoResearch SKILL is now officially open source! 🎉 > https://t.co/V3lwwdyQm8 > Alongside it, we’re dropping our 4th survey paper — this time on Self-play. > https://t.co/SEb2qoKCI6 > Inspired by AlphaZero, we got a powerful insight: prior knowledge doesn’t always lift the ceiling. > Models can discover more globally optimal solutions just by playing against themselves. > The biggest change in this paper? > For the first time, the AutoResearch Agent autonomously planned GPU experiments — and submitted actual RL runs on the DeepSeek 285B model. > The entire RL pipeline — experiment design, code writing, running, debugging, and conclusion summarization — was 100% automated, with zero human intervention from me. > This was incredibly difficult, but an incredibly important step. > https://t.co/kuZZNux5RH > GRPO is the tool being called by the AutoResearch Agent here. > We see this as the beginning of our Continual Learning research journey. 🚀 > As always, this is my personal research project, unaffiliated with any organization. All views are my own. > #AI #ReinforcementLearning #SelfPlay #OpenSource #AutoML #ContinualLearning #DeepSeek > https://x.com/victor207755822/status/2067259098584985954 AYi (@AYi_AInotes): 可以直接用的工程思路： 1、状态别全靠对话历史，用文件系统持久化，新迭代只注入必要状态 2、建 stall 检测 + 强制 pivot 机制，连续没进步就换框架，不是加参数 3、执行和验证分离，重要任务别让同一个系统既干又评 4、多层 watchdog，核心 agent 挂了外部脚本还能拉起来 5、Fresh session 思维，长期项目定期重置上下文 + 只保留精华那个 SKILL.md 本身值得认真读，里面那些约束，很多是真实跑了几十上百小时血泪经验总结出来的，不是纸上谈兵。 AYi (@AYi_AInotes): 以前做研究，人类是操作员，一步步盯着。现在代理能把「实验-运行-debug-总结」这个循环自己跑通，人类的角色在变成导演：定大方向、设边界条件、定义什么叫成功、设计评价标准。 Deli 说这是他们 Continual Learning 旅程的开始。未来真正厉害的，不是会用 AI 的人，而是能设计出稳定自主闭环的人——不管是研究、内容，还是产品。

#开源#技术#智能体

阅读原始全文