AI 精选动态智能评分 85

Qwen-Robot Suite: 阿里云发布具身智能三模型套件

来源: twitter关注列表

作者: Alibaba Cloud (@alibaba_cloud)

发布于: 2026-06-16

收录于: 2026-06-16

AI 推荐理由

引入自然语言通用动作接口实现跨领域物理知识协同训练，具身智能领域重要技术突破，建议深入研究模型架构和训练方法。

核心解读

阿里云发布 Qwen-Robot Suite，包含 Qwen-RobotNav（统一 5 个导航任务）、Qwen-RobotManip（跨机器人状态-动作空间训练，基于 38,100+ 小时开源语料）、Qwen-RobotWorld（单一世界模型支持 20+ 形态，200M+ 帧训练数据），并实现自然语言动作接口和跨领域物理知识协同训练，三模型在 EWMBench/DreamGen/WorldModelBench/PBench 上表现优异。

全文

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence. 🧭 Qwen-RobotNav — the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving • Controllable observation protocol • Tool interface for agentic systems 🤖 Qwen-RobotManip — the foundation of interaction. • Unified state-action space across heterogeneous robots • Camera-frame delta poses for coherent cross-embodiment training • Pretrained on a 38,100+ hour open-source corpus 🌍 Qwen-RobotWorld — infinite worlds for physical agents. • Single world model, 20+ embodiments • Natural-language action interface • Predicts physically grounded futures across manipulation, driving, and navigation Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it. 📷 Blog: https://t.co/olblKRpiBE 📖 Report： Qwen-RobotNav: https://t.co/tySB8XRVEV Qwen-RobotManip: https://t.co/uGnx6IpvJd Qwen-RobotWorld： https://t.co/DSJZB2PtYm ![photo](https://pbs.twimg.com/media/HK8FiOBboAA7sbx.jpg) ![photo](https://pbs.twimg.com/media/HK8Fjtua4AEmh_n.jpg) Alibaba Cloud (@alibaba_cloud): Qwen-Robot Suite，enable AI from chatbot to physical action in the real world. More demos, please visit our blog:https://t.co/4UgEGiE52L https://t.co/l6Jt69TAm9 Alibaba Cloud (@alibaba_cloud): By treating natural language as a universal action interface，Qwen-RobotWorld bridges the gap between general video generation models and domain-specific embodied models — this converts end-effector poses, steering commands, and navigation waypoints into a single interface, enabling 20+ embodiment types and 500+ action categories to be co-trained under the Embodied World Knowledge corpus (8.6M video-text pairs, 200M+ frames), with each domain's physical knowledge reinforcing the others. Qwen-RobotWorld performs strongly across EWMBench/DreamGen/WorldModelBench/PBench benchmarks. Blog Link：https://t.co/GTnOP3tVyu

#技术#模型发布#多模态

阅读原始全文