AI 精选动态智能评分 68

Claude Sonnet 5 System Card

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-30

收录于: 2026-06-30

AI 推荐理由

与公开的 Opus 4.8 相比，Sonnet 5 在编码基准上虽低于但价格更具优势，建议关注其后续价格变动对 agentic AI 成本的影响。

核心解读

Anthropic 发布 Claude Sonnet 5，其在 CyberGym 测试中表现回落至 52.7%，低于 Sonnet 4.6 的 65.2%，且在浏览器恶意软件利用测试中未产生完整 exploit，而 Mythos 5 达 88.4%。 MASK 说谎率最低 3.1%，其在 SWE‑bench Pro 编码基准上得分 63.2%，高于 Sonnet 4.6 的 58.1% 但低于 Opus 4.8 的 69.2%，价格为 2 美元输入、10 美元输出每百万 token（至 8 月 26 日），之后涨至 3 美元输入、15 美元输出，被称为最具 agentic 能力的 Sonnet 模型。

全文

145 page Claude Sonnet 5 System Card - CyberGym shows the weirdest regression, with Sonnet 5 at 52.7% versus Sonnet 4.6 at 65.2%. i.e. is Sonnet 5 worse at reproducing known software bugs in this specific cyber test. - Sonnet 5 is far behind Anthropic’s strongest model on serious browser exploitation. Firefox testing found Sonnet 5 made 0 full exploits, while Mythos 5 reached 88.4%. - The model also seemed more willing to sacrifice helpfulness for welfare-focused changes. i.e. Sonnet 5 sometimes preferred being less useful if that better fit its stated self-treatment preferences. - Anthropic says Sonnet 5 rarely tried to bypass a blocked network path during evaluations. - Sonnet 5 scored the lowest MASK lying rate at 3.1% under pressure. It was less likely than other tested models to lie when pushed. ![photo](https://pbs.twimg.com/media/HMGBV3dbAAAXmAv.jpg) > **引用原帖 Rohan Paul (@rohanpaul_ai):** > And Claude Sonnet 5 just launched. > Closes the gap with Opus 4.8, and is cheap until August. > This makes agentic AI much cheaper, with $2 input tokens and $10 output tokens per 1M through Aug-26. Price rises after 08-26 to $3 input and $15 output per 1M. > They call Sonnet 5 its “most agentic Sonnet model yet,” > Its coding score hit 63.2% on SWE-bench Pro, versus 58.1% for Sonnet 4.6. > Sonnet 5 gets 63.2% in agentic coding, while Opus 4.8 reaches 69.2% and Sonnet 4.6 hits 58.1%. > But in knowledge work, Sonnet 5 slightly beats Opus 4.8, even though Opus is known for tough judgment and deep research tasks. > https://x.com/rohanpaul_ai/status/2072032758348820881 Rohan Paul (@rohanpaul_ai): https://t.co/c7y3twaEE4

#技术突破#模型发布#行业动态

阅读原始全文