返回精选
AI 精选动态 智能评分 67

OpenAI 发布 LifeSciBench 基准

来源: twitter关注列表
作者: OpenAI (@OpenAI)
发布于: 2026-06-17
收录于: 2026-06-17
AI 推荐理由
该基准强调真实研究场景而非知识测试,值得关注 AI 在科学推理上的进展。
核心解读
OpenAI 与 173 位科学家合作发布 LifeSciBench,包含 750 个专家编写的任务,覆盖七个生物研究流程,用于评估 AI 在真实研究场景中的推理、不确定性处理等能力。初始结果显示 GPT-Rosalind 在全部流程上得分超过 GPT-5.5。
全文
Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, LifeSciBench includes 750 expert-authored tasks across seven biological research workflows. https://t.co/JTk0wXHFrT ![photo](https://pbs.twimg.com/media/HLCs-hgaUAAuUAP.jpg) OpenAI (@OpenAI): Benchmarks often test biological knowledge or narrow skills. The tasks in LifeSciBench test whether models can reason from evidence, work with scientific artifacts, handle uncertainty, and make useful decisions under real-world constraints. GPT‑Rosalind scores above GPT‑5.5 across all seven workflows. These initial results show meaningful progress—and room to improve, particularly on artifact-heavy, design-intensive, and operationally constrained work. OpenAI (@OpenAI): LifeSciBench is a foundation for more realistic evaluation, targeted improvements, and continued partnership with the life sciences community—helping the field measure progress, identify gaps, and improve AI together for the benefit of everyone.
#AI#基准测试#研究