AI 精选动态
智能评分 60
atomic.chat 对比多模型物理编码性能
AI 推荐理由
该测试提供了多个模型在相同任务下的具体 token 和成本对比,揭示了 Sonnet 5 的高性价比,但需注意非官方基准。核心解读
atomic.chat 测试了 Claude Sonnet 5、Claude Opus 4.8、Claude Sonnet 4.6 和 GPT 5.5 在三个物理编码 demo 上的表现,Sonnet 5 使用 15,047 tokens($0.15)达到与 GPT 5.5(31,152 tokens,$0.94)相当的性能,成本为 1/6,且 token 数最少。
全文
atomic[.]chat, a desktop app that runs LLMs locally, ran a very revealing comparison for Claude Sonnet 5, Claude Opus 4.8, Claude Sonnet 4.6, and GPT 5.5.
Claude Sonnet 5 just matched GPT 5.5 on 3 physics coding demos at 6x lower cost.
Also spent minimum number of tokens.
- Sonnet 5: 15,047 tokens, $0.15
- Opus 4.8: 23,063 tokens, $0.58
- Sonnet 4.6: 25,824 tokens, $0.39
- GPT 5.5: 31,152 tokens, $0.94
https://video.twimg.com/amplify_video/2072105222772764672/vid/avc1/1920x1080/QPyRwmlWWja1IsZx.mp4?tag=28
> **引用原帖 atomic.chat (@atomic_chat_hq):**
> New Claude Sonnet 5 performs at GPT 5.5 level 6x cheaper!
> We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics crash demos
> Prompts:
> - A car crashes into a brick wall
> - A wrecking ball destroys a house
> - A catapult throws a rock at a castle wall
> Outputs:
> Sonnet 5: 15,047 tokens, $0.15
> Opus 4.8: 23,063 tokens, $0.58
> Sonnet 4.6: 25,824 tokens, $0.39
> GPT 5.5: 31,152 tokens, $0.94
> Sonnet 5 did as well as Opus 4.8 and GPT 5.5 on all three tests. In the wrecking ball test, it beat Opus 4.8. The cable moves smoothly and every hit connects. In the catapult test, it beat GPT 5.5. The rock always lands inside the wall. Sonnet 5 still needs better detail and graphics. But it used fewer tokens than every other model
> https://x.com/atomic_chat_hq/status/2072099564870349267
Rohan Paul (@rohanpaul_ai): Github of Atomic-Chat.
"an open source alternative to ChatGPT that runs 100% offline on your computer."
https://t.co/rfprzJzHRO
Download it here.
https://t.co/aMAZoaXjZ4