trl-mcsd / examples /notebooks /README.md
ihbkaiser's picture
Implement MCSD for experimental SDPO
1fa3c6c verified

Notebooks

This directory contains a collection of Jupyter notebooks that demonstrate how to use the TRL library in different applications.

Notebook Description Open in Colab
grpo_trl_lora_qlora.ipynb GRPO using QLoRA on free Colab Open In Colab
grpo_agent.ipynb GRPO for agent training Not available due to OOM with Colab GPUs
grpo_rnj_1_instruct.ipynb GRPO rnj-1-instruct with QLoRA using TRL on Colab to add reasoning capabilities Open In Colab
sft_ministral3_vl.ipynb Supervised Fine-Tuning (SFT) Ministral 3 with QLoRA using TRL on free Colab Open In Colab
grpo_ministral3_vl.ipynb GRPO Ministral 3 with QLoRA using TRL on free Colab Open In Colab
sft_nemotron_3.ipynb SFT with LoRA on NVIDIA Nemotron 3 models Open In Colab
sft_trl_lora_qlora.ipynb Supervised Fine-Tuning (SFT) using QLoRA on free Colab Open In Colab
sft_qwen_vl.ipynb Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab Open In Colab
sft_tool_calling.ipynb Teaching tool calling to a model without native tool-calling support using SFT with QLoRA Open In Colab
grpo_qwen3_vl.ipynb GRPO Qwen3-VL with QLoRA using TRL on free Colab Open In Colab

OpenEnv Notebooks

These notebooks demonstrate GRPO training with OpenEnv environments using environment_factory. The BrowserGym notebook uses the lower-level rollout_func API instead.

Notebook Description Open in Colab
openenv_wordle_grpo.ipynb GRPO to play Wordle on an OpenEnv environment Open In Colab
openenv_sudoku_grpo.ipynb GRPO to play Sudoku on an OpenEnv environment Open In Colab
grpo_functiongemma_browsergym_openenv.ipynb GRPO on FunctionGemma in the BrowserGym environment Open In Colab