ihbkaiser's picture
Implement MCSD for experimental SDPO
1fa3c6c verified
# Post-training with NeMo Gym and TRL
This integration supports training language models in NeMo-Gym environments using TRL GRPO. Both single step and multi step tasks are supported, including multi-environment training. NeMo-Gym orchestrates rollouts, returning token ids and logprobs to TRL through the rollout function for training. Currently this integration is only supported through TRL's vllm server mode.
Check out the docs page `docs/source/nemo_gym.md` for a guide.