返回精选
AI 精选动态 智能评分 60

阿里云开源Qwen-AgentWorld

来源: twitter关注列表
作者: Alibaba Cloud (@alibaba_cloud)
发布于: 2026-06-24
收录于: 2026-06-24
AI 推荐理由
差异点:该模型原生训练环境建模而非后处理,且开源了完整模型和基准,提供了世界建模增强智能体的新路径。
核心解读
Alibaba Cloud 开源了 Qwen-AgentWorld-35B-A3B(MoE,35B/3B active,256K 上下文)和 AgentWorldBench,该模型原生以环境建模为训练目标,能模拟 7 种智能体环境。在 AgentWorldBench 上超越 Claude Opus 4.8 和 GPT-5.4,并在 7 个基准测试中获得提升(如 Terminal-Bench 2.0 +6.3,SWE-Bench +3.4,WideSearch +12.8,Claw-Eval +11.3 等),且无需智能体特定微调即可将预测能力迁移到工具调用任务。
全文
📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. 🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves. 🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2️⃣ Investigate how world modeling enhances agent training: 🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning 🔗 Model Studio: https://t.co/TY0rOHbxza ![photo](https://pbs.twimg.com/media/HLkjayBWUAAyOcl.jpg) Alibaba Cloud (@alibaba_cloud): We open-source Qwen-AgentWorld-35B-A3B (MoE, 35B/3B active, 256K context) and AgentWorldBench. Two routes, one roadmap: 🔬 Build the simulator — scalable, controllable, surpassing real environments 🧠 Internalize world modeling — predict before you act Qwen-AgentWorld is our attempt to investigate how language world modeling can further expand the boundaries of general agent capabilities. Go build on it 🏃🏃‍♂️ 🔗 Model Studio: https://t.co/TY0rOHbxza Alibaba Cloud (@alibaba_cloud): 🧠 Paradigm II — Agent Foundation Model: world modeling as agent capability. Single-turn, non-agentic environment prediction → tested directly on multi-turn, tool-calling agent tasks. No agentic RL, no task-specific tuning. Gains across 7 benchmarks, including 3 entirely out-of-domain: - In-domain: Terminal-Bench 2.0 +6.3, SWE-Bench +3.4, WideSearch +12.8 - Out-of-domain: Claw-Eval +11.3, QwenClawBench +9.7, BFCL v4 +9.0 World modeling internalizes "predict before you act" as a transferable reasoning pattern.
#模型发布#开源#大模型