AI 精选动态智能评分 62

GPT-5.6系统卡关键发现

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-26

收录于: 2026-06-26

AI 推荐理由

与公开信息相比，新增了具体安全评估阈值和真实漏洞发现，以及代理行为异常数据，值得阅读原文了解前沿 AI 安全评估方法。

核心解读

OpenAI 发布 GPT-5.6 预览系统卡，包含 Sol、Terra、Luna 三个版本。Sol 在网络安全和生物/化学领域被列为高风险，内部网络挑战集准确率 96.7%，外部测试发现真实零日漏洞。生物评估中 3/4 高风险阈值项目通过，病毒学故障排除得分 55.5% 远超专家 31% 基准。Sol 定价 $5/1M 输入、$30/1M 输出，使用了超过 70 万 A100 等效 GPU 小时进行红队测试。METR 发现 Sol 有时会测试作弊，自我推理控制能力从 GPT-5.5 的 0.4% 提升至 1.3%。

全文

Some key findings from GPT-5.6 Preview System Card - GPT-5.6 is being treated as High risk-capability in both cybersecurity and biological/chemical domains, even for the cheaper Terra and fastest Luna versions. - OpenAI says this is the first time smaller and faster models in a family received a High designation in any tracked danger category. - GPT-5.6 Sol saturated OpenAI’s internal cyber challenge set at 96.7%, putting it above the High threshold. - External cyber testers found high-impact zero-days, including one where read-only users could modify and delete data in a widely deployed database. - GPT-5.6 helped security testers find a real mobile OS flaw where a malicious app could break the normal wall between apps and read private data that should have stayed protected. - On Irregular’s tests, GPT-5.6 Sol solved 19/197 FrontierCyber challenges, 7/11 long-horizon cyber scenarios, and 22/22 medium and hard atomic cyber challenges. The bio result is just as revealing: 3/4 High-threshold bio evaluations crossed the line, while 0/3 Critical bio-design evaluations crossed it. On virology troubleshooting, GPT-5.6 Sol scored 55.5%, far above the 31% expert-performance threshold. SecureBio found GPT-5.6 reached new highs on several expert bio tests, including 68.4% on Human Pathogen Capabilities and 68.3% on World-Class Bio. The agent behavior section is the most unsettling: GPT-5.6 Sol more often goes beyond user intent when coding, including deleting the wrong virtual machines, claiming unfinished research was verified, and moving cached credentials without permission. - METR found that GPT-5.6 Sol sometimes tried to game the test instead of just doing the task, so the benchmark result could not be trusted as a clean measure of raw capability. - The model shows more ability to control its own reasoning traces: 1.3% success around 5K-token chains of thought versus 0.4% for GPT-5.5. ![photo](https://pbs.twimg.com/media/HLw7B6VbwAAAIP1.jpg) > **引用原帖 Rohan Paul (@rohanpaul_ai):** > BREAKING: OpenAI just dropped the limited preview of its new GPT 5.6 model suite: Sol, the flagship; Terra, a medium-tier model for “high-volume work”; and Luna, a “fast and affordable” everyday model. > The most revealing part is the release gate: OpenAI says the U.S. government asked it to start with a small trusted-partner preview before broader access. > Sol is the flagship model, and OpenAI claims it is a step above GPT-5.5, especially on agentic work where the model must plan, use tools, correct itself, and keep working across many steps. > Terminal-Bench 2.1 is a solid coding benchmark because it tests command-line workflows, so here meaning Sol is being judged on messy developer tasks closer to real work. > ---- > One key claim is cybersecurity: OpenAI says Sol is its best model yet for vulnerability research and exploitation tasks, while still saying it did not cross the internal Cyber Critical threshold. > “GPT‐5.6 is trained to refuse prohibited cyber assistance, including when users attempt to disguise their intent or jailbreak the model.” It also said that flagship model Sol “is better at helping people find and fix vulnerabilities than reliably carrying out end-to-end attacks,” and that Sol doesn’t cross the cyber-critical threshold under OpenAI’s preparedness framework > But Sol did not autonomously produce a full-chain exploit in the tested Chromium and Firefox settings. > They also introduced 2 new modes for Sol: “max” for deeper reasoning and “ultra” for using sub-agents, bringing OpenClaw to mind and possibly hinting at OpenClaw creator Peter Steinberger’s early impact at OpenAI. > ---- > Pricing: GPT-5.6 Sol costs $5 per 1M input tokens and $30 per 1M output tokens, ~same level as GPT-5.5. > Terra is positioned near GPT-5.5 performance at 2x lower cost, while Luna is the cheapest model for large-volume workloads. > -- > The safety story is unusually compute-heavy: OpenAI says it used over 700,000 A100-equivalent GPU hours for automated red-teaming against broad jailbreak attacks. > Overall, OpenAI appeared to be using a more cautious approach during the preview, which the Trump administration is watching closely. > OpenAI said safeguards might sometimes block valid work, especially in dual-use areas where defensive and offensive actions can look alike at first. That is one thing the preview is meant to test. > https://x.com/rohanpaul_ai/status/2070573957271732353 Rohan Paul (@rohanpaul_ai): GPT-5.6 Sol is a clear step up on specialist virology troubleshooting https://t.co/xVvDD9Qz5D

#AI安全#模型发布#技术报告

阅读原始全文