AI 精选动态智能评分 85

VibeThinker分析

来源: twitter关注列表

作者: Rohan Paul (@rohanpaul_ai)

发布于: 2026-06-24

收录于: 2026-06-24

AI 推荐理由

核対关键数据

核心解读

配置与方法细节三元组

全文

VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO. Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode contests. "places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2" They start from a 3B Qwen2.5-Coder base model, then train it with carefully filtered hard examples, multi-solution supervised training, reinforcement learning on math/code/STEM tasks with verifiable rewards, self-distillation, instruction-focused RL, and a test-time answer-checking method called CLR. ![photo](https://pbs.twimg.com/media/HLi8HSPXYAEBHeE.jpg) Rohan Paul (@rohanpaul_ai): https://t.co/5YXh4vSrFZ

#AI#技术#指导

阅读原始全文