返回精选
AI 精选动态 智能评分 65

视角规划研究 提升 VLM 成功率

来源: twitter关注列表
作者: Fei-Fei Li (@drfeifei)
发布于: 2026-06-18
收录于: 2026-06-18
AI 推荐理由
RL-Graph-SFT 框架把 VLM 视角规划成功率从 2.5% 提升到 47.8%。
核心解读
视角规划研究小组提出 ViewSuite,提供 6 维自由度相机控制和约 165K 个任务实例,并两次评测 Path-to-View、View-to-Path 与 Interactive View Planning,发现模型只能粗略追踪摄像机动作,无法形成完整规划。对 Qwen2.5-VL-7B 进行 RL 训练仅 2.5% 成功率;采用 View Graph Distillation(RL-Graph-SFT)后成功率提升至 47.8%。
全文
Fei-Fei Li (@drfeifei) 转发了 Manling Li (@ManlingLi_) 的帖子: Planning with the views: Can VLMs predict how each camera move changes the view, and plan many such moves ahead? We introduce ViewSuite with 6 DoF camera control and ~165K task instances, testing: Path-to-View View-to-Path Interactive View Planning A sharp Planning Gap emerges: + can roughly "track" how camera action changes views - cannot "compose" a plan towards a target view at all We then try to teach VLMs with Reinforcement Learning. - RL cannot teach VLMs such planning ability, only 2.5% success rate with Qwen2.5-VL-7B. + With View Graph Distillation (our RL-Graph-SFT framework), 2.5% → 47.8% Below, we answer these questions: Q1. What are the failure modes? Q2. How can we make RL work? Q3. What has the model learned? Can we open up the model to see before/after? Can such spatial priors transfer to other view related tasks? Led by @James_KKW, great to work with @LINJIEFUN @zhengyuan_yang @shiqi_chen17 @wzenus @drfeifei @jiajunwu_cs Leonidas Guibas, Lijuan Wang. A joint efforts with @StanfordAILab @StanfordSVL @MSFTResearch. https://video.twimg.com/amplify_video/2067696956114407424/vid/avc1/1920x1080/qdXJ7UpSfpqPlUDJ.mp4?tag=28
#技术突破#研究#模型