ihbkaiser
/

trl-mcsd

Model card Files Files and versions

trl-mcsd / examples /scripts /nemo_gym /README.md

ihbkaiser's picture

Implement MCSD for experimental SDPO

1fa3c6c verified 29 days ago

|

history blame contribute delete

483 Bytes

	# Post-training with NeMo Gym and TRL

	This integration supports training language models in NeMo-Gym environments using TRL GRPO. Both single step and multi step tasks are supported, including multi-environment training. NeMo-Gym orchestrates rollouts, returning token ids and logprobs to TRL through the rollout function for training. Currently this integration is only supported through TRL's vllm server mode.

	Check out the docs page `docs/source/nemo_gym.md` for a guide.