返回精选
AI 精选动态 智能评分 85

VibeThinker分析

来源: twitter关注列表
作者: Rohan Paul (@rohanpaul_ai)
发布于: 2026-06-24
收录于: 2026-06-24
AI 推荐理由
核対关键数据
核心解读
配置与方法细节三元组
全文
VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO. Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode contests. "places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2" They start from a 3B Qwen2.5-Coder base model, then train it with carefully filtered hard examples, multi-solution supervised training, reinforcement learning on math/code/STEM tasks with verifiable rewards, self-distillation, instruction-focused RL, and a test-time answer-checking method called CLR. ![photo](https://pbs.twimg.com/media/HLi8HSPXYAEBHeE.jpg) Rohan Paul (@rohanpaul_ai): https://t.co/5YXh4vSrFZ
#AI#技术#指导