Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4624 /en /example_overview.md

rtrm

about 2 months ago

preview code

download

raw

16.1 kB

Examples

TRL provides notebooks for quick experimentation and scripts for production training.

Notebooks: Most run on free Google Colab. Great for learning and prototyping.
Scripts: Run on single GPU, multi-GPU, or with DeepSpeed. Ready for production.

Getting Started

pip install --upgrade trl[quantization]

For scripts, configure 🤗 Accelerate (recommended for multi-GPU):

accelerate config

📓 Notebooks

Interactive notebooks for quick experimentation. Find them in examples/notebooks/.

🚀 Getting started

Generic notebooks that work with any model. Start here!

Notebook	Method	Model	Colab
SFT a 14B model with LoRA/QLoRA on Free Colab	SFT	Qwen3-14B
GRPO a 7B model with LoRA/QLoRA on Free Colab	GRPO	Qwen2-7B

🤖 Agents

Train models for agentic tasks and tool use.

Notebook	Method	Model	Colab
Agent Training Qwen3-1.7B with Tool Calling (BioGRID SQL)	GRPO	Qwen3-1.7B	⚠️ Larger GPU

🎮 OpenEnv

Train agents in interactive environments using OpenEnv.

Notebook	Method	Model	Colab
Train Qwen3-1.7B to Play Wordle	GRPO	Qwen3-1.7B
FunctionGemma for Browser Control (BrowserGym)	GRPO	FunctionGemma-270M

🎯 Model-specific

Notebooks for specific models, including Vision Language Models (VLM) and reasoning.

Notebook	Method	Model	VLM
Add Reasoning Capabilities to rnj-1-instruct-1B with GRPO and QLoRA	GRPO	rnj-1-instruct-1B
SFT Ministral-3B VLM with QLoRA on Free Colab	SFT	Ministral-3B	✅
GRPO Ministral-3B VLM with QLoRA on Free Colab	GRPO	Ministral-3B	✅
SFT Qwen3-VL with QLoRA on Free Colab	SFT	Qwen3-VL	✅
GRPO Qwen3-VL with QLoRA on Free Colab	GRPO	Qwen3-VL	✅

📜 Scripts

Scripts are maintained in the trl/scripts and examples/scripts directories. They show how to use different trainers such as SFTTrainer, PPOTrainer, DPOTrainer, GRPOTrainer, and more.

File	Description
`examples/scripts/bco.py`	This script shows how to use the experimental.kto.KTOTrainer with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the openbmb/UltraFeedback dataset.
`examples/scripts/cpo.py`	This script shows how to use the experimental.cpo.CPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
`trl/scripts/dpo.py`	This script shows how to use the DPOTrainer to fine-tune a model.
`examples/scripts/dpo_vlm.py`	This script shows how to use the DPOTrainer to fine-tune a Vision Language Model to reduce hallucinations using the openbmb/RLAIF-V-Dataset dataset.
`examples/scripts/evals/judge_tldr.py`	This script shows how to use experimental.judges.HfPairwiseJudge or experimental.judges.OpenAIPairwiseJudge to judge model generations.
`examples/scripts/gkd.py`	This script shows how to use the experimental.gkd.GKDTrainer to fine-tune a model.
`trl/scripts/grpo.py`	This script shows how to use the GRPOTrainer to fine-tune a model.
`trl/scripts/grpo_agent.py`	This script shows how to use the GRPOTrainer to fine-tune a model to enable agentic usage.
`examples/scripts/grpo_vlm.py`	This script shows how to use the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
`examples/scripts/gspo.py`	This script shows how to use GSPO via the GRPOTrainer to fine-tune model for reasoning using the AI-MO/NuminaMath-TIR dataset.
`examples/scripts/gspo_vlm.py`	This script shows how to use GSPO via the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
`examples/scripts/kto.py`	This script shows how to use the experimental.kto.KTOTrainer to fine-tune a model.
`examples/scripts/mpo_vlm.py`	This script shows how to use MPO via the DPOTrainer to align a model based on preferences using the HuggingFaceH4/rlaif-v_formatted dataset and a set of loss weights with weights.
`examples/scripts/nash_md.py`	This script shows how to use the experimental.nash_md.NashMDTrainer to fine-tune a model.
`examples/scripts/online_dpo.py`	This script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a model.
`examples/scripts/online_dpo_vlm.py`	This script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a a Vision Language Model.
`examples/scripts/openenv/browsergym.py`	Simple script to run GRPO training via the GRPOTrainer with OpenEnv's BrowserGym environment and vLLM for VLMs
`examples/scripts/openenv/browsergym_llm.py`	Simple script to run GRPO training via the GRPOTrainer with OpenEnv's BrowserGym environment and vLLM for LLMs
`examples/scripts/openenv/catch.py`	Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Catch environment (OpenSpiel) and vLLM
`examples/scripts/openenv/echo.py`	Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Echo environment and vLLM.
`examples/scripts/openenv/wordle.py`	Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Wordle environment and vLLM.
`examples/scripts/orpo.py`	This script shows how to use the experimental.orpo.ORPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
`examples/scripts/ppo/ppo.py`	This script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language.
`examples/scripts/ppo/ppo_tldr.py`	This script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to generate TL;DR summaries.
`examples/scripts/prm.py`	This script shows how to use the experimental.prm.PRMTrainer to fine-tune a Process-supervised Reward Model (PRM).
`examples/scripts/reward_modeling.py`	This script shows how to use the RewardTrainer to train an Outcome Reward Model (ORM) on your own dataset.
`examples/scripts/rloo.py`	This script shows how to use the RLOOTrainer to fine-tune a model to improve its ability to solve math questions.
`examples/scripts/sft.py`	This script shows how to use the SFTTrainer to fine-tune a model.
`examples/scripts/sft_gemma3.py`	This script shows how to use the SFTTrainer to fine-tune a Gemma 3 model.
`examples/scripts/sft_video_llm.py`	This script shows how to use the SFTTrainer to fine-tune a Video Language Model.
`examples/scripts/sft_vlm.py`	This script shows how to use the SFTTrainer to fine-tune a Vision Language Model in a chat setting. The script has only been tested with LLaVA 1.5, LLaVA 1.6, and Llama-3.2-11B-Vision-Instruct models, so users may see unexpected behaviour in other model architectures.
`examples/scripts/sft_vlm_gemma3.py`	This script shows how to use the SFTTrainer to fine-tune a Gemma 3 model on vision to text tasks.
`examples/scripts/sft_vlm_smol_vlm.py`	This script shows how to use the SFTTrainer to fine-tune a SmolVLM model.
`examples/scripts/xpo.py`	This script shows how to use the experimental.xpo.XPOTrainer to fine-tune a model.

Distributed Training (for scripts)

You can run scripts on multiple GPUs with 🤗 Accelerate:

accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

For DeepSpeed ZeRO-{1,2,3}:

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

Adjust NUM_GPUS and --all_arguments_of_the_script as needed.

Xet Storage Details

Size:: 16.1 kB
Xet hash:: a1af6fd7b1a81bb98ae76a3f6920151a6446607bd71573ae6eeef8b863a69cc9

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.