AI 精选动态智能评分 65

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-25

收录于: 2026-06-25

AI 推荐理由

点读原文获取具体数据趋势，尤其注意长上下文下的幻觉恶化情况。

核心解读

该研究测试了LLM在文档问答中编造答案的比例，最佳模型在32K上下文下编造率1.19%，强模型通常5-7%，中等模型约25%，在200K上下文下所有模型至少10%编造，表明幻觉随上下文增长恶化。

全文

This study tests how often LLMs invent answers when they should rely only on supplied documents. The problem is that companies often use LLMs to answer questions from documents and they assume document-based LLM systems are safer because the model is given source material. This study shows that no model fully avoided fabrication, because even the best model made up answers 1.19% of the time at 32K context. For strong models, a more normal best-case rate was around 5% to 7%, while the middle model fabricated about 25% of answers to questions about facts that did not exist. Longer context made the problem much worse, and at 200K context every tested model fabricated at least 10% of the time. Shows that hallucination is not just a failure to retrieve the right sentence. A model can be good at finding real facts and still be too willing to answer when the requested fact is absent. ---- Link – arxiv. org/abs/2603.08274 Title: "How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms" ![photo](https://pbs.twimg.com/media/HLrNMB5boAAClFp.png)

#AI安全#技术#研究

阅读原始全文