AI 精选动态智能评分 65

SemiAnalysis：推理优化使token成本结构性下降

来源: twitter关注列表

作者: SemiAnalysis (@SemiAnalysis_)

发布于: 2026-06-27

收录于: 2026-06-27

AI 推荐理由

提供了具体的软件与硬件优化组合带来的实际吞吐量倍数，以及 token 成本结构性下降的预测，值得阅读原文获取完整数学和厂商拆分。

核心解读

SemiAnalysis 发布 AI 价值捕获分析，透露其员工月均消费近 50 亿 tokens（Meta 的 5 倍），token 支出占薪酬 30%。通过软件优化（wideEP+disagg+MTP）可将 B300 上 DeepSeek R1 吞吐量从 FP8 的 1000 tok/s/GPU 提升至约 14000（14x），GB300 NVL72 可达 H100 最优配置的 17x（FP8）或 32x（FP4），预示 token 成本将结构性下降。

全文

One of the more uncomfortable observations in our AI Value Capture piece is internal: our token spend at SemiAnalysis now runs at roughly 30% of employee compensation, with employees pulling just under 5 billion tokens per month on average, over 5x more than Meta, and our top contributors clearing 100 billion. We wrote about it openly because every research firm, hedge fund, and law firm we know is heading toward a similar number, just on a delay. (1/4)🧵 ![photo](https://pbs.twimg.com/media/HL1efifWQAApeWQ.jpg) SemiAnalysis (@SemiAnalysis_): The throughput math has gotten the most pushback in our reader notes, so its worth being precise. On the same B300 running DeepSeek R1, baseline FP8 sits near 1,000 tokens/sec/GPU, adding wideEP plus disagg gets you to roughly 8,000, and layering MTP on top pushes it to about 14,000, a 14x gain from software alone. Factor in hardware too and the most optimized GB300 NVL72 hits about 17x the best H100 config in FP8, 32x in FP4. Once you accept that compression is real, model-lab gross margin expansion stops looking like a temporary pricing oddity and starts looking structural. (3/4) SemiAnalysis (@SemiAnalysis_): If you are an operator trying to write down what tokens will cost in 2027, the answer is materially lower than today, and the firms that have already adopted are the ones setting the pace. The full math, plus a value capture breakdown across labs, hyperscalers, inference providers, neoclouds, and memory vendors, is in the piece. (4/4) https://t.co/kIjohI1jZ3

#AI#技术#分析

阅读原始全文