返回精选
AI 精选动态 智能评分 60

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

来源: twitter关注列表
作者: Rohan Paul (@rohanpaul_ai)
发布于: 2026-06-18
收录于: 2026-06-18
AI 推荐理由
该结果挑战了“更多测试时计算自动有益”的直觉,为部署大规模代码模型提供了具体效率指导,值得阅读原文以获取并行循环设计细节。
核心解读
LoopCoder-v2 论文研究了代码模型在测试时增加循环次数的影响,训练了7B参数模型(1-4次循环,18T tokens)并测试代码编写、推理等任务。结果表明2次循环最优,SWE-bench Verified 从43.0提升至64.4,而3次和4次循环效果变差。内部分析显示第二次循环进行有效细化,后续循环主要引入重复性变化和位置偏移代价。
全文
Big claim in this paper, pushes against the common idea that more test-time compute should keep helping. Claims a code model gets much better when it rethinks once (i.e. by looping once) inside itself, but worse when it keeps rethinking. The first loop builds context, the second loop refines it, and later loops mostly disturb it. The paper studies a faster design called Parallel Loop Transformer, where loops can run almost in parallel and share memory, so the authors can ask a cleaner question about how many loops are actually useful. They trained 7B code models with 1, 2, 3, and 4 loops on 18T tokens, then tuned and tested them on code writing, code reasoning, software engineering, and tool-use tasks. The main result is that 2 loops worked best, raising SWE-bench Verified from 43.0 to 64.4, while 3 and 4 loops often got worse. Their internal checks suggest loop 2 does the real useful refinement, because it changes the model’s hidden states, attention patterns, and predictions in meaningful ways. After loop 2, the extra loops mostly add weaker, more repetitive changes, while a built-in position shift keeps adding the same kind of mismatch cost. Overall, the paper gives a simple lesson for efficient test-time compute: adding 1 hidden loop can help a lot, but adding more is not automatically better. ---- Link – arxiv. org/abs/2606.18023 Title: "LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling" ![photo](https://pbs.twimg.com/media/HLDvXnhbwAA3fGn.png)
#研究#大模型#基准测试