返回精选
AI 精选动态 智能评分 60

百度开源Unlimited OCR

来源: twitter关注列表
作者: Baidu Inc. (@Baidu_Inc)
发布于: 2026-06-23
收录于: 2026-06-23
AI 推荐理由
首次提出 Reference Sliding Window Attention,使 OCR 模型在保持常量 KV 缓存的同时能够一次处理 40+ 页长文档。
核心解读
百度宣布开源 Unlimited OCR 模型,该模型具有 30 亿总参数、仅激活 5 亿参数。在 OmniDocBench v1.5 和 v1.6 基准上实现新的端到端 SOTA,并采用 Reference Sliding Window Attention (R-SWA) 机制,保持常量 KV 缓存,使其能够在一次前向传递中转录 40+ 页文档而不丢失上下文。相较于之前的 SOTA,Unlimited OCR 在同一基准上取得了更高的得分。
全文
3B total parameters & 500M activated, yet powerful enough to transcribe 40+ pages in one pass while keeping context intact. Meet Unlimited OCR! > **引用原帖 Baidu AI (@BaiduAI_News):** > We’re open-sourcing Unlimited OCR — built to read long documents in one pass. > With 3B total parameters and only 500M activated, Unlimited OCR sets new end-to-end SOTA results on OmniDocBench v1.5 and v1.6. > The key innovation is Reference Sliding Window Attention (R-SWA), inspired by how humans transcribe books: keeping the source, recent context, and next words in focus, while softly forgetting what’s no longer needed. > With constant KV Cache size and lower attention cost, Unlimited OCR can transcribe 40+ pages in a single forward pass — without losing context or slowing down. > Explore the model👇: > --GitHub: https://t.co/5ZJBsEldKd > --Hugging Face: https://t.co/4FKFr9EfOu > https://x.com/BaiduAI_News/status/2069322806748410291 Baidu Inc. (@Baidu_Inc): Github: https://t.co/Hsu1RxFqhq Hugging Face: https://t.co/grWysiMrFx
#模型发布#开源#技术突破