AI 精选动态
智能评分 60
RPC-Bench 论文理解基准
AI 推荐理由
该基准的数据来源于真实论文审稿环节,对评估模型论文理解能力有独特价值,建议查看原文了解具体任务和GPT-5等模型的表现。核心解读
RPC-Bench是一个长上下文和多模态文档理解基准,包含61.3K QA对(来自4150篇论文),其中约15K经过人工验证;GPT-5在该基准上的正确性-完整性得分为68.2%,调整简洁性后降至37.46%。
全文
Check out RPC-Bench on ModelScope! Built for long-context models, paper RAG systems, and multimodal document understanding. 🚀
🔗 https://t.co/H07x1lVi4g
📄 https://t.co/C4oEptELON
🖼️ Supports both text and visual inputs, with Markdown, original PDFs, parsing outputs, and page images for VLM evaluation
✅ Scale: 61.3K QA pairs from 4,150 papers, with about 15K human-verified QA pairs for evaluation
📚 Built from real review-rebuttal exchanges, so the questions focus on methods, evidence, claims, and reviewer-style paper understanding
📊 Even GPT-5 only reaches 68.2% on correctness-completeness, dropping to 37.46% after conciseness adjustment

