AI 精选动态智能评分 60

RPC-Bench 论文理解基准

来源: twitter关注列表

作者: ModelScope (@ModelScope2022)

发布于: 2026-06-29

收录于: 2026-06-29

AI 推荐理由

该基准的数据来源于真实论文审稿环节，对评估模型论文理解能力有独特价值，建议查看原文了解具体任务和GPT-5等模型的表现。

核心解读

RPC-Bench是一个长上下文和多模态文档理解基准，包含61.3K QA对（来自4150篇论文），其中约15K经过人工验证；GPT-5在该基准上的正确性-完整性得分为68.2%，调整简洁性后降至37.46%。

全文

Check out RPC-Bench on ModelScope! Built for long-context models, paper RAG systems, and multimodal document understanding. 🚀 🔗 https://t.co/H07x1lVi4g 📄 https://t.co/C4oEptELON 🖼️ Supports both text and visual inputs, with Markdown, original PDFs, parsing outputs, and page images for VLM evaluation ✅ Scale: 61.3K QA pairs from 4,150 papers, with about 15K human-verified QA pairs for evaluation 📚 Built from real review-rebuttal exchanges, so the questions focus on methods, evidence, claims, and reviewer-style paper understanding 📊 Even GPT-5 only reaches 68.2% on correctness-completeness, dropping to 37.46% after conciseness adjustment ![photo](https://pbs.twimg.com/media/HL8z-jXW0AAZkyu.jpg) ![photo](https://pbs.twimg.com/media/HL8z-kHWQAAm_WN.jpg)

#技术#评测#数据

阅读原始全文