AI 精选动态
智能评分 60
UFP4:均匀网格FP4训练方法
AI 推荐理由
建议关注其均匀网格设计在低精度训练中的具体技术细节和应用条件。核心解读
Ant Ling团队发布论文,提出UFP4均匀网格FP4训练方法,在1.5B Dense、7.9B MoE和124B MoE长程预训练中,其质量比强E2M1基线更接近BF16。关键洞察是FP4训练质量不仅取决于位宽,还取决于网格几何。
全文
We recently released a paper showing that UFP4, our uniform-grid FP4 training recipe, stays closer to BF16 than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining.
The key insight: FP4 training quality is not only about bit width, but also grid geometry.


Ant Ling (@AntLingAGI): Our takeaway is simple: E2M1 should remain useful for range-limited workloads, but future FP4 training systems should support uniform 4-bit grids as first-class primitives.
For full technical details, derivations, and ablations, see the paper:
https://t.co/avwqJhIeVp https://t.co/k7QRwSGw7p
Ant Ling (@AntLingAGI): UFP4 addresses this by using a uniform E1M2/INT4-style grid.
With the grid-level bias removed, RHT can be applied more broadly across training GEMMs, improving quantization quality while keeping the recipe practical for FP4 pretraining. https://t.co/OUm9fyM41L