AI 精选动态智能评分 65

Next-Latent Prediction Transformers Learn Compact World Models

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-24

收录于: 2026-06-24

AI 推荐理由

与传统 next token 预测相比，NextLat 通过预测隐藏状态学习紧凑世界模型，提升规划能力并加速推理，值得关注其在世界模型和长程推理方向的应用前景。

核心解读

微软提出 NextLat 方法，在 next token 预测基础上增加隐藏状态预测任务，迫使模型学习紧凑世界模型。实验表明，该方法在迷宫导航、数学推理、图规划等任务上表现更好，并实现最高 3.3 倍推理加速，且不改变 Transformer 架构。

全文

New Microsoft paper argues that transformers generalize better when they learn compact internal states, not just next tokens. The problem is that normal transformers can look back at every earlier token, so they do not have to squeeze the past into a clean summary. token prediction alone can reward shortcuts that do not become coherent world models. That can work beautifully on familiar data and still fail when the model has to plan, detour, reason, or carry a hidden structure forward. NextLat fixes this by adding a training task where the model must predict its next hidden state, not just the next word. A hidden state is the model’s private summary of what it has seen, so predicting the next one pushes the model to learn how situations change over time. The authors tested this on map-like world modeling, math reasoning, graph planning, story prediction, and regular language modeling. The main result is that NextLat often learned more compact and useful internal states, solved planning tasks better, and sped up generation by up to 3.3x. Overall, it gives transformers some of the useful memory behavior of recurrent models without changing the transformer architecture or slowing normal inference. ---- Link – arxiv. org/abs/2511.05963 Title: "Next-Latent Prediction Transformers Learn Compact World Models" ![photo](https://pbs.twimg.com/media/HLi4xZ6WUAECkof.png) > **引用原帖 Jayden Teoh (@jayden_teoh_):** > Next-token prediction is myopic. What if transformers learn to predict their own next latent state? > 🌠 We present 𝗡𝗲𝘅𝘁-𝗟𝗮𝘁𝗲𝗻𝘁 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 (𝗡𝗲𝘅𝘁𝗟𝗮𝘁): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! 🚀 > https://x.com/jayden_teoh_/status/2066905213328605612

#模型#技术突破#研究

阅读原始全文