返回精选
AI 精选动态 智能评分 85

AI 团队发布关于进步的资料

来源: twitter关注列表
作者: Nathan Lambert (@natolambert)
发布于: 2026-06-23
收录于: 2026-06-23
AI 推荐理由
建议深入阅读文献
核心解读
行业动态汇报
全文
Something I should add -- on-policy distillation was the last content I got to sneak into the book before going to print. Felt very important to have this method covered, it's growing rapidly and used in distinct ways. So you can also read what is covered in this lecture! https://t.co/ayyy4ZweYK ![photo](https://pbs.twimg.com/media/HLgjfT4bsAACF86.jpg) > **引用原帖 Nathan Lambert (@natolambert):** > New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today! > At 7.4 hours of video in my post-training brain dump and counting :) > It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks). > Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods. > 00:00 The emergence of synthetic data > 10:50 Background on teacher-student knowledge-distillation > 24:47: On-policy distillation (OPD, MOPD, and OPSD) > 37:11 Constitutional AI & AI Feedback > 45:50 Rubrics as rewards & conclusions > Ofc, watch on YouTube etc. > https://x.com/natolambert/status/2069439017750609972 Nathan Lambert (@natolambert): https://t.co/kBEE4u12XN
#AI#技术#创新