AI 精选动态
智能评分 60
百度开源Unlimited OCR
AI 推荐理由
首次提出 Reference Sliding Window Attention,使 OCR 模型在保持常量 KV 缓存的同时能够一次处理 40+ 页长文档。核心解读
百度宣布开源 Unlimited OCR 模型,该模型具有 30 亿总参数、仅激活 5 亿参数。在 OmniDocBench v1.5 和 v1.6 基准上实现新的端到端 SOTA,并采用 Reference Sliding Window Attention (R-SWA) 机制,保持常量 KV 缓存,使其能够在一次前向传递中转录 40+ 页文档而不丢失上下文。相较于之前的 SOTA,Unlimited OCR 在同一基准上取得了更高的得分。
全文
3B total parameters & 500M activated, yet powerful enough to transcribe 40+ pages in one pass while keeping context intact. Meet Unlimited OCR!
> **引用原帖 Baidu AI (@BaiduAI_News):**
> We’re open-sourcing Unlimited OCR — built to read long documents in one pass.
> With 3B total parameters and only 500M activated, Unlimited OCR sets new end-to-end SOTA results on OmniDocBench v1.5 and v1.6.
> The key innovation is Reference Sliding Window Attention (R-SWA), inspired by how humans transcribe books: keeping the source, recent context, and next words in focus, while softly forgetting what’s no longer needed.
> With constant KV Cache size and lower attention cost, Unlimited OCR can transcribe 40+ pages in a single forward pass — without losing context or slowing down.
> Explore the model👇:
> --GitHub: https://t.co/5ZJBsEldKd
> --Hugging Face: https://t.co/4FKFr9EfOu
> https://x.com/BaiduAI_News/status/2069322806748410291
Baidu Inc. (@Baidu_Inc): Github: https://t.co/Hsu1RxFqhq
Hugging Face: https://t.co/grWysiMrFx