AI 精选动态
智能评分 75
技术突破
AI 推荐理由
经验证提升核心解读
模型行为优化提升效率
全文
As AI takes on longer, higher-stakes tasks, we want models to carry beneficial and safe behavior into new domains beyond their training—and maintain it under pressure.
That’s the idea behind our new research on training models to be broadly and persistently beneficial. https://t.co/6Yw45s1RRq
OpenAI (@OpenAI): We also tested whether alignment persisted under pressure.
The model was harder to steer toward harmful behavior with adversarial prompts, while remaining responsive to helpful instructions.
We saw preliminary evidence of greater resistance to harmful fine-tuning. https://t.co/dFXdWdMuDG
OpenAI (@OpenAI): This is an early step toward more robustly beneficial and aligned models: training models to carry beneficial traits into new situations, so as AI becomes more capable, it also becomes more reliable, transparent, and helpful for people.