AI 精选动态智能评分 66

Introducing LocateAnything-3B

来源: twitter关注列表

作者: ModelScope (@ModelScope2022)

发布于: 2026-05-28

收录于: 2026-05-28

AI 推荐理由

值得关注其将多种 grounding 能力合并到单一模型的训练配方与部署形态，尤其是 2.5x 吞吐提升和已集成进 Nemotron Nano Omni 的工程落地方式。

核心解读

NVIDIA 发布 LocateAnything-3B，一款用于快速、精确 visual grounding 的 vision-language model。该模型在现有方法上实现最高 2.5x 吞吐提升，训练数据包含 1200 万张图像、1.38 亿+ queries 和 7.85 亿个 bounding boxes，覆盖自然场景、机器人、自动驾驶、GUI 和文档理解。模型可同时完成 object detection、phrase grounding、GUI element locating、scene text detection、document layout 和 pointing，并已集成到 NVIDIA Nemotron Nano Omni 用于 production-grade VLM grounding；该模型仅限非商业研究用途。

#模型发布#多模态#技术突破

阅读原始全文