AI 精选动态智能评分 60

Matrix 在 GDPval-Bench 超越 Codex 和 Claude Code

来源: twitter关注列表

作者: Chubby♨️ (@kimmonismus)

发布于: 2026-06-29

收录于: 2026-06-29

AI 推荐理由

差异点：Matrix 在长任务规划与协调上显著优于同类产品，而不仅仅是模型能力堆砌。

核心解读

Matrix 在 GDPval-Bench 上取得 95.45% 的成绩，超过 Codex 的 84.9% 和 Claude Code 的 80.3%，推文认为其更像一个真正的 AI 公司操作系统层而非提示编排器。

全文

This is the first "AI company" product I've seen that doesn't feel like pure cosplay. Two interesting points: Matrix treats the company idea seriously. You are not just creating agents and hoping they coordinate. Matrix beat both Codex and Claude Code on GDPval-Bench, with 95.45% against 84.9% and 80.3% respectively. That gap seems to matter most on longer tasks, where planning and coordination actually decide the outcome rather than raw model capability. Which is maybe the point. A lot of "AI companies" are really just prompt orchestrators with a nice UI. Matrix looks like it's building something closer to an actual operating layer. Whether that holds up beyond benchmarks, I don't know yet. But it really makes me want to find out. ![photo](https://pbs.twimg.com/media/HMALo2VaoAAxut6.jpg) > **引用原帖 Matrix (@matrix_build):** > what if you can run an entire 0-person company — > without the grind of running a team? > matrix is the runtime that makes it possible. > in last week’s limited beta, our users created tens of thousands of new 0-person companies and started real businesses in matrix. > today, matrix is open to everyone. > launch yours ↓ > https://x.com/matrix_build/status/2071585973541195805

#AI#技术#评测

阅读原始全文