AI 精选动态智能评分 66

How we contain Claude across products

来源: Anthropic-engineering

发布于: 2026-05-26

收录于: 2026-05-26

AI 推荐理由

文章把 agent 安全拆成了环境、模型和监督三层防线，并给出了 Claude Code 的 93% 授权率和 Opus 4.7 在 red teaming 基准上的 0.1% 攻击成功率，适合关注可落地的 containment 设计。

核心解读

Anthropic 发表文章，系统总结了如何在 claude.ai、Claude Code 和 Claude Cowork 等产品中约束 Claude 的权限边界，以控制 agent 的“爆炸半径”。文中给出两个方向：一是通过 human-in-the-loop 监督，Anthropic 发现 Claude Code 的权限提示用户会批准约 93% 的请求，因此推出 Claude Code auto mode 来减少审批疲劳；二是通过 containment，用沙箱、虚拟机、文件系统边界和 egress controls 限制 agent 可访问的环境。文章还列出三类风险：用户误用、模型误行为、外部攻击者，并提到在 Gray Swan 的 Agent Red Teaming benchmark 上，Claude Opus 4.7 将 prompt injection 的攻击成功率控制在约 0.1%。

#AI安全#智能体#开发者工具

阅读原始全文