AI 精选动态
智能评分 60
Matrix 在 GDPval-Bench 超越 Codex 和 Claude Code
AI 推荐理由
差异点:Matrix 在长任务规划与协调上显著优于同类产品,而不仅仅是模型能力堆砌。核心解读
Matrix 在 GDPval-Bench 上取得 95.45% 的成绩,超过 Codex 的 84.9% 和 Claude Code 的 80.3%,推文认为其更像一个真正的 AI 公司操作系统层而非提示编排器。
全文
This is the first "AI company" product I've seen that doesn't feel like pure cosplay.
Two interesting points:
Matrix treats the company idea seriously. You are not just creating agents and hoping they coordinate. Matrix beat both Codex and Claude Code on GDPval-Bench, with 95.45% against 84.9% and 80.3% respectively.
That gap seems to matter most on longer tasks, where planning and coordination actually decide the outcome rather than raw model capability.
Which is maybe the point. A lot of "AI companies" are really just prompt orchestrators with a nice UI. Matrix looks like it's building something closer to an actual operating layer. Whether that holds up beyond benchmarks, I don't know yet. But it really makes me want to find out.

> **引用原帖 Matrix (@matrix_build):**
> what if you can run an entire 0-person company —
> without the grind of running a team?
> matrix is the runtime that makes it possible.
> in last week’s limited beta, our users created tens of thousands of new 0-person companies and started real businesses in matrix.
> today, matrix is open to everyone.
> launch yours ↓
> https://x.com/matrix_build/status/2071585973541195805