AI 精选动态
智能评分 65
The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators
AI 推荐理由
与常见self-improving方法不同,该研究让evaluator也参与进化,是突破固定评估瓶颈的新思路,值得细读原文。核心解读
Cambridge Univ和NVIDIA等团队提出Red Queen Gödel Machine方法,让AI agent和evaluator共同进化,避免固定评估导致的停滞。在coding实验中,比之前最好的self-improving agent节省1.35×-1.72× token;在paper writing实验中,co-evolved writer比固定评估基线提高约1.86X接受率。
全文
New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck.
Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.
The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game.
The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge.
During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers.
The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment.
On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.
On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline.
The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.
----
Link – arxiv. org/abs/2606.26294
Title: "The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators"
