Buckets:

hf-doc-build/doc-dev / trl /pr_4624 /en /example_overview.md
rtrm's picture
|
download
raw
16.1 kB

Examples

TRL provides notebooks for quick experimentation and scripts for production training.

  • Notebooks: Most run on free Google Colab. Great for learning and prototyping.
  • Scripts: Run on single GPU, multi-GPU, or with DeepSpeed. Ready for production.

Getting Started

pip install --upgrade trl[quantization]

For scripts, configure ๐Ÿค— Accelerate (recommended for multi-GPU):

accelerate config

๐Ÿ““ Notebooks

Interactive notebooks for quick experimentation. Find them in examples/notebooks/.

๐Ÿš€ Getting started

Generic notebooks that work with any model. Start here!

๐Ÿค– Agents

Train models for agentic tasks and tool use.

Notebook Method Model Colab
Agent Training Qwen3-1.7B with Tool Calling (BioGRID SQL) GRPO Qwen3-1.7B โš ๏ธ Larger GPU

๐ŸŽฎ OpenEnv

Train agents in interactive environments using OpenEnv.

Notebook Method Model Colab
Train Qwen3-1.7B to Play Wordle GRPO Qwen3-1.7B Open In Colab
FunctionGemma for Browser Control (BrowserGym) GRPO FunctionGemma-270M Open In Colab

๐ŸŽฏ Model-specific

Notebooks for specific models, including Vision Language Models (VLM) and reasoning.

๐Ÿ“œ Scripts

Scripts are maintained in the trl/scripts and examples/scripts directories. They show how to use different trainers such as SFTTrainer, PPOTrainer, DPOTrainer, GRPOTrainer, and more.

File Description
examples/scripts/bco.py This script shows how to use the experimental.kto.KTOTrainer with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the openbmb/UltraFeedback dataset.
examples/scripts/cpo.py This script shows how to use the experimental.cpo.CPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
trl/scripts/dpo.py This script shows how to use the DPOTrainer to fine-tune a model.
examples/scripts/dpo_vlm.py This script shows how to use the DPOTrainer to fine-tune a Vision Language Model to reduce hallucinations using the openbmb/RLAIF-V-Dataset dataset.
examples/scripts/evals/judge_tldr.py This script shows how to use experimental.judges.HfPairwiseJudge or experimental.judges.OpenAIPairwiseJudge to judge model generations.
examples/scripts/gkd.py This script shows how to use the experimental.gkd.GKDTrainer to fine-tune a model.
trl/scripts/grpo.py This script shows how to use the GRPOTrainer to fine-tune a model.
trl/scripts/grpo_agent.py This script shows how to use the GRPOTrainer to fine-tune a model to enable agentic usage.
examples/scripts/grpo_vlm.py This script shows how to use the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
examples/scripts/gspo.py This script shows how to use GSPO via the GRPOTrainer to fine-tune model for reasoning using the AI-MO/NuminaMath-TIR dataset.
examples/scripts/gspo_vlm.py This script shows how to use GSPO via the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
examples/scripts/kto.py This script shows how to use the experimental.kto.KTOTrainer to fine-tune a model.
examples/scripts/mpo_vlm.py This script shows how to use MPO via the DPOTrainer to align a model based on preferences using the HuggingFaceH4/rlaif-v_formatted dataset and a set of loss weights with weights.
examples/scripts/nash_md.py This script shows how to use the experimental.nash_md.NashMDTrainer to fine-tune a model.
examples/scripts/online_dpo.py This script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a model.
examples/scripts/online_dpo_vlm.py This script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a a Vision Language Model.
examples/scripts/openenv/browsergym.py Simple script to run GRPO training via the GRPOTrainer with OpenEnv's BrowserGym environment and vLLM for VLMs
examples/scripts/openenv/browsergym_llm.py Simple script to run GRPO training via the GRPOTrainer with OpenEnv's BrowserGym environment and vLLM for LLMs
examples/scripts/openenv/catch.py Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Catch environment (OpenSpiel) and vLLM
examples/scripts/openenv/echo.py Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Echo environment and vLLM.
examples/scripts/openenv/wordle.py Simple script to run GRPO training via the GRPOTrainer with OpenEnv's Wordle environment and vLLM.
examples/scripts/orpo.py This script shows how to use the experimental.orpo.ORPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
examples/scripts/ppo/ppo.py This script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language.
examples/scripts/ppo/ppo_tldr.py This script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to generate TL;DR summaries.
examples/scripts/prm.py This script shows how to use the experimental.prm.PRMTrainer to fine-tune a Process-supervised Reward Model (PRM).
examples/scripts/reward_modeling.py This script shows how to use the RewardTrainer to train an Outcome Reward Model (ORM) on your own dataset.
examples/scripts/rloo.py This script shows how to use the RLOOTrainer to fine-tune a model to improve its ability to solve math questions.
examples/scripts/sft.py This script shows how to use the SFTTrainer to fine-tune a model.
examples/scripts/sft_gemma3.py This script shows how to use the SFTTrainer to fine-tune a Gemma 3 model.
examples/scripts/sft_video_llm.py This script shows how to use the SFTTrainer to fine-tune a Video Language Model.
examples/scripts/sft_vlm.py This script shows how to use the SFTTrainer to fine-tune a Vision Language Model in a chat setting. The script has only been tested with LLaVA 1.5, LLaVA 1.6, and Llama-3.2-11B-Vision-Instruct models, so users may see unexpected behaviour in other model architectures.
examples/scripts/sft_vlm_gemma3.py This script shows how to use the SFTTrainer to fine-tune a Gemma 3 model on vision to text tasks.
examples/scripts/sft_vlm_smol_vlm.py This script shows how to use the SFTTrainer to fine-tune a SmolVLM model.
examples/scripts/xpo.py This script shows how to use the experimental.xpo.XPOTrainer to fine-tune a model.

Distributed Training (for scripts)

You can run scripts on multiple GPUs with ๐Ÿค— Accelerate:

accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

For DeepSpeed ZeRO-{1,2,3}:

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

Adjust NUM_GPUS and --all_arguments_of_the_script as needed.

Xet Storage Details

Size:
16.1 kB
ยท
Xet hash:
a1af6fd7b1a81bb98ae76a3f6920151a6446607bd71573ae6eeef8b863a69cc9

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.