AI 精选动态
智能评分 80
Qwen 发布 Qwen-AgentWorld:原生语言世界模型
AI 推荐理由
原文展示了世界模型在 Agent 领域的突破性方法,值得阅读论文:零样本迁移结果可能改变 Agent 训练范式。核心解读
Qwen 团队开源 Qwen-AgentWorld-35B-A3B(MoE 架构,35B 总参数/3B 激活,256K 上下文)和 AgentWorldBench。该模型原生模拟 7 个智能体环境,在 AgentWorldBench 上超越 Claude Opus 4.8 和 GPT-5.4。研究显示,世界建模预训练可零样本迁移至智能体任务,在 Terminal-Bench 2.0 提升 6.3、SWE-Bench 提升 3.4 等。
全文
📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation.
🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves.
🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes:
1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench
2️⃣ Investigate how world modeling enhances agent training:
🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments
🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning
📑 Paper: https://t.co/Jx2l5RKq71
📖 Blog: https://t.co/7tVcKyhsx2
💻 GitHub: https://t.co/B5Lvb1UZCn
🤗 HuggingFace: https://t.co/Kw3QBL1TM5
🧩 ModelScope: https://t.co/YBnGYgMWWI

Qwen (@Alibaba_Qwen): We open-source Qwen-AgentWorld-35B-A3B (MoE, 35B/3B active, 256K context) and AgentWorldBench.
Two routes, one roadmap:
🔬 Build the simulator — scalable, controllable, surpassing real environments
🧠 Internalize world modeling — predict before you act
Qwen-AgentWorld is our attempt to investigate how language world modeling can further expand the boundaries of general agent capabilities.
Go build on it 🏃🏃♂️
📑 Paper: https://t.co/Jx2l5RKq71
📖 Blog: https://t.co/7tVcKyhsx2
💻 GitHub: https://t.co/B5Lvb1UZCn
🤗 HuggingFace: https://t.co/Kw3QBL1TM5
🧩 ModelScope: https://t.co/YBnGYgMWWI
Qwen (@Alibaba_Qwen): 🧠 Paradigm II — Agent Foundation Model: world modeling as agent capability.
Single-turn, non-agentic environment prediction → tested directly on multi-turn, tool-calling agent tasks. No agentic RL, no task-specific tuning.
Gains across 7 benchmarks, including 3 entirely out-of-domain:
- In-domain: Terminal-Bench 2.0 +6.3, SWE-Bench +3.4, WideSearch +12.8
- Out-of-domain: Claw-Eval +11.3, QwenClawBench +9.7, BFCL v4 +9.0
World modeling internalizes "predict before you act" as a transferable reasoning pattern.