返回精选
AI 精选动态 智能评分 85

Qwen-Robot Suite: 阿里云发布具身智能三模型套件

来源: twitter关注列表
作者: Alibaba Cloud (@alibaba_cloud)
发布于: 2026-06-16
收录于: 2026-06-16
AI 推荐理由
引入自然语言通用动作接口实现跨领域物理知识协同训练,具身智能领域重要技术突破,建议深入研究模型架构和训练方法。
核心解读
阿里云发布 Qwen-Robot Suite,包含 Qwen-RobotNav(统一 5 个导航任务)、Qwen-RobotManip(跨机器人状态-动作空间训练,基于 38,100+ 小时开源语料)、Qwen-RobotWorld(单一世界模型支持 20+ 形态,200M+ 帧训练数据),并实现自然语言动作接口和跨领域物理知识协同训练,三模型在 EWMBench/DreamGen/WorldModelBench/PBench 上表现优异。
全文
📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence. 🧭 Qwen-RobotNav — the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving • Controllable observation protocol • Tool interface for agentic systems 🤖 Qwen-RobotManip — the foundation of interaction. • Unified state-action space across heterogeneous robots • Camera-frame delta poses for coherent cross-embodiment training • Pretrained on a 38,100+ hour open-source corpus 🌍 Qwen-RobotWorld — infinite worlds for physical agents. • Single world model, 20+ embodiments • Natural-language action interface • Predicts physically grounded futures across manipulation, driving, and navigation Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it. 📷 Blog: https://t.co/olblKRpiBE 📖 Report: Qwen-RobotNav: https://t.co/tySB8XRVEV Qwen-RobotManip: https://t.co/uGnx6IpvJd Qwen-RobotWorld: https://t.co/DSJZB2PtYm ![photo](https://pbs.twimg.com/media/HK8FiOBboAA7sbx.jpg) ![photo](https://pbs.twimg.com/media/HK8Fjtua4AEmh_n.jpg) Alibaba Cloud (@alibaba_cloud): Qwen-Robot Suite,enable AI from chatbot to physical action in the real world. More demos, please visit our blog:https://t.co/4UgEGiE52L https://t.co/l6Jt69TAm9 Alibaba Cloud (@alibaba_cloud): By treating natural language as a universal action interface,Qwen-RobotWorld bridges the gap between general video generation models and domain-specific embodied models — this converts end-effector poses, steering commands, and navigation waypoints into a single interface, enabling 20+ embodiment types and 500+ action categories to be co-trained under the Embodied World Knowledge corpus (8.6M video-text pairs, 200M+ frames), with each domain's physical knowledge reinforcing the others. Qwen-RobotWorld performs strongly across EWMBench/DreamGen/WorldModelBench/PBench benchmarks. Blog Link:https://t.co/GTnOP3tVyu
#技术#模型发布#多模态