Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
etemiz 
posted an update 14 days ago
Post
144
which one is better for alignment?

ORPO or GSPO?

I think ORPO is pretty good and fast but GSPO makes it attack its own opinions, reflecting on itself, correcting itself. Although GSPO is much slower, it may still be pretty effective. And for GSPO you don't have to provide the whole reasoning corpus, you just provide the end result (One word maybe to answer a binary question).

And GSPO may be better than GRPO because it is rewarding 'train of thoughts' whereas GRPO is rewarding single tokens. Alignment is mostly train of thoughts, not a single token like a math answer..

In this post