Post
47
TRL v1.7.0 is out‼️
+ continuous batching makes GRPO and RLOO 1.25x faster at -16 GB
+ proper MoE post-training across GRPO/RLOO/AsyncGRPO
+ new GMPO trainer
+ AsyncGRPO weight sync + padding-free
+ more
https://github.com/huggingface/trl/releases/tag/v1.7.0
wrote a small article about the continuous batching for GRPO feature
https://huggingface.co/blog/sergiopaniego/cb-trl-grpo
+ continuous batching makes GRPO and RLOO 1.25x faster at -16 GB
+ proper MoE post-training across GRPO/RLOO/AsyncGRPO
+ new GMPO trainer
+ AsyncGRPO weight sync + padding-free
+ more
https://github.com/huggingface/trl/releases/tag/v1.7.0
wrote a small article about the continuous batching for GRPO feature
https://huggingface.co/blog/sergiopaniego/cb-trl-grpo