Notebooks

This directory contains a collection of Jupyter notebooks that demonstrate how to use the TRL library in different applications.

Notebook	Description	Open in Colab
`grpo_trl_lora_qlora.ipynb`	GRPO using QLoRA on free Colab
`grpo_agent.ipynb`	GRPO for agent training	Not available due to OOM with Colab GPUs
`grpo_rnj_1_instruct.ipynb`	GRPO rnj-1-instruct with QLoRA using TRL on Colab to add reasoning capabilities
`sft_ministral3_vl.ipynb`	Supervised Fine-Tuning (SFT) Ministral 3 with QLoRA using TRL on free Colab
`grpo_ministral3_vl.ipynb`	GRPO Ministral 3 with QLoRA using TRL on free Colab
`sft_nemotron_3.ipynb`	SFT with LoRA on NVIDIA Nemotron 3 models
`sft_trl_lora_qlora.ipynb`	Supervised Fine-Tuning (SFT) using QLoRA on free Colab
`sft_qwen_vl.ipynb`	Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab
`sft_tool_calling.ipynb`	Teaching tool calling to a model without native tool-calling support using SFT with QLoRA
`grpo_qwen3_vl.ipynb`	GRPO Qwen3-VL with QLoRA using TRL on free Colab

OpenEnv Notebooks

These notebooks demonstrate GRPO training with OpenEnv environments using environment_factory. The BrowserGym notebook uses the lower-level rollout_func API instead.

Notebook	Description	Open in Colab
`openenv_wordle_grpo.ipynb`	GRPO to play Wordle on an OpenEnv environment
`openenv_sudoku_grpo.ipynb`	GRPO to play Sudoku on an OpenEnv environment
`grpo_functiongemma_browsergym_openenv.ipynb`	GRPO on FunctionGemma in the BrowserGym environment