Great article, thanks for the clear explanation.
A small notation suggestion: in the “A Recap on GRPO” section, using R_i for the reward score instead of r_i could help distinguish it from the importance ratio r_{i,t} and reduce confusion.
Zheng
AyongZheng
AI & ML interests
None yet
Recent Activity
commentedon an article 8 days ago
From GRPO to DAPO and GSPO: What, Why, and How authored a paper 4 months ago
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence liked a model 4 months ago
baidu/Qianfan-OCROrganizations
None yet