返回精选
AI 精选动态 智能评分 60

RPC-Bench 论文理解基准

来源: twitter关注列表
作者: ModelScope (@ModelScope2022)
发布于: 2026-06-29
收录于: 2026-06-29
AI 推荐理由
该基准的数据来源于真实论文审稿环节,对评估模型论文理解能力有独特价值,建议查看原文了解具体任务和GPT-5等模型的表现。
核心解读
RPC-Bench是一个长上下文和多模态文档理解基准,包含61.3K QA对(来自4150篇论文),其中约15K经过人工验证;GPT-5在该基准上的正确性-完整性得分为68.2%,调整简洁性后降至37.46%。
全文
Check out RPC-Bench on ModelScope! Built for long-context models, paper RAG systems, and multimodal document understanding. 🚀 🔗 https://t.co/H07x1lVi4g 📄 https://t.co/C4oEptELON 🖼️ Supports both text and visual inputs, with Markdown, original PDFs, parsing outputs, and page images for VLM evaluation ✅ Scale: 61.3K QA pairs from 4,150 papers, with about 15K human-verified QA pairs for evaluation 📚 Built from real review-rebuttal exchanges, so the questions focus on methods, evidence, claims, and reviewer-style paper understanding 📊 Even GPT-5 only reaches 68.2% on correctness-completeness, dropping to 37.46% after conciseness adjustment ![photo](https://pbs.twimg.com/media/HL8z-jXW0AAZkyu.jpg) ![photo](https://pbs.twimg.com/media/HL8z-kHWQAAm_WN.jpg)
#技术#评测#数据