AI 精选动态智能评分 65

视角规划研究提升 VLM 成功率

来源: twitter关注列表

作者: Fei-Fei Li (@drfeifei)

发布于: 2026-06-18

收录于: 2026-06-18

AI 推荐理由

RL-Graph-SFT 框架把 VLM 视角规划成功率从 2.5% 提升到 47.8%。

核心解读

视角规划研究小组提出 ViewSuite，提供 6 维自由度相机控制和约 165K 个任务实例，并两次评测 Path-to-View、View-to-Path 与 Interactive View Planning，发现模型只能粗略追踪摄像机动作，无法形成完整规划。对 Qwen2.5-VL-7B 进行 RL 训练仅 2.5% 成功率；采用 View Graph Distillation（RL-Graph-SFT）后成功率提升至 47.8%。

全文

Fei-Fei Li (@drfeifei) 转发了 Manling Li (@ManlingLi_) 的帖子： Planning with the views: Can VLMs predict how each camera move changes the view, and plan many such moves ahead? We introduce ViewSuite with 6 DoF camera control and ~165K task instances, testing: Path-to-View View-to-Path Interactive View Planning A sharp Planning Gap emerges: + can roughly "track" how camera action changes views - cannot "compose" a plan towards a target view at all We then try to teach VLMs with Reinforcement Learning. - RL cannot teach VLMs such planning ability, only 2.5% success rate with Qwen2.5-VL-7B. + With View Graph Distillation (our RL-Graph-SFT framework), 2.5% → 47.8% Below, we answer these questions: Q1. What are the failure modes? Q2. How can we make RL work? Q3. What has the model learned? Can we open up the model to see before/after? Can such spatial priors transfer to other view related tasks? Led by @James_KKW, great to work with @LINJIEFUN @zhengyuan_yang @shiqi_chen17 @wzenus @drfeifei @jiajunwu_cs Leonidas Guibas, Lijuan Wang. A joint efforts with @StanfordAILab @StanfordSVL @MSFTResearch. https://video.twimg.com/amplify_video/2067696956114407424/vid/avc1/1920x1080/qdXJ7UpSfpqPlUDJ.mp4?tag=28

#技术突破#研究#模型

阅读原始全文

视角规划研究 提升 VLM 成功率

视角规划研究提升 VLM 成功率