AI 精选动态智能评分 60

LLM在文档问答中的幻觉率研究

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-25

收录于: 2026-06-25

AI 推荐理由

该研究提供了具体的幻觉率数据，并揭示了长上下文对幻觉的显著影响，值得阅读原文以了解完整实验设置。

核心解读

一项使用172B token的研究测试了LLM在文档问答场景中的幻觉率，最佳模型在32K上下文时幻觉1.19%，强模型通常为5-7%，中等模型约25%，200K上下文时所有模型至少10%。

全文

Rohan Paul (@rohanpaul_ai) 转发了 Gary Marcus (@GaryMarcus) 的帖子： Crazy how many times people have told me over the last five years that a solution to hallucinations was right around the corner — and yet here we still are. > **引用原帖 Rohan Paul (@rohanpaul_ai):** > This study tests how often LLMs invent answers when they should rely only on supplied documents. > The problem is that companies often use LLMs to answer questions from documents and they assume document-based LLM systems are safer because the model is given source material. > This study shows that no model fully avoided fabrication, because even the best model made up answers 1.19% of the time at 32K context. > For strong models, a more normal best-case rate was around 5% to 7%, while the middle model fabricated about 25% of answers to questions about facts that did not exist. > Longer context made the problem much worse, and at 200K context every tested model fabricated at least 10% of the time. > Shows that hallucination is not just a failure to retrieve the right sentence. > A model can be good at finding real facts and still be too willing to answer when the requested fact is absent. > ---- > Link – arxiv. org/abs/2603.08274 > Title: "How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms" > https://x.com/rohanpaul_ai/status/2070193617554264542

#AI#技术#研究

阅读原始全文