Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4331 /en /example_overview.md

rtrm

about 1 month ago

preview code

download

raw

11.9 kB

	# Examples

	This directory contains a collection of examples that demonstrate how to use the TRL library for various applications. We provide both scripts for advanced use cases and notebooks for an easy start and interactive experimentation.

	The notebooks are self-contained and can run on free Colab, while the scripts can run on single GPU, multi-GPU, or DeepSpeed setups.

	Getting Started

	Install TRL and additional dependencies as follows:

	```bash
	pip install --upgrade trl[quantization]
	```

	Check for additional optional dependencies [here](https://github.com/huggingface/trl/blob/main/pyproject.toml).

	For scripts, you will also need an 🤗 Accelerate config (recommended for multi-gpu settings):

	```bash
	accelerate config # will prompt you to define the training configuration
	```

	This allows you to run scripts with `accelerate launch` in single or multi-GPU settings.

	## Notebooks

	These notebooks are easier to run and are designed for quick experimentation with TRL. The list of notebooks can be found in the [`trl/examples/notebooks/`](https://github.com/huggingface/trl/tree/main/examples/notebooks/) directory.


	\| Notebook \| Description \| Open in Colab \|
	\|----------\|-------------\|---------------\|
	\| [`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb) \| Supervised Fine-Tuning (SFT) using QLoRA on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb) \|
	\| [`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb) \| Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb) \|
	\| [`grpo_qwen3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_qwen3_vl.ipynb) \| GRPO Qwen3-VL with QLoRA using TRL on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb) \|

	## Scripts

	Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl/blob/main/trl/scripts) and [`examples/scripts`](https://github.com/huggingface/trl/blob/main/examples/scripts) directories. They show how to use different trainers such as `SFTTrainer`, `PPOTrainer`, `DPOTrainer`, `GRPOTrainer`, and more.

	File \| Description \|
	\| --- \| --- \|
	\| [`examples/scripts/bco.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/bco.py) \| This script shows how to use the [KTOTrainer](/docs/trl/pr_4331/en/kto_trainer#trl.KTOTrainer) with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. \|
	\| [`examples/scripts/cpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/cpo.py) \| This script shows how to use the [CPOTrainer](/docs/trl/pr_4331/en/cpo_trainer#trl.CPOTrainer) to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. \|
	\| [`trl/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/dpo.py) \| This script shows how to use the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo_vlm.py) \| This script shows how to use the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to fine-tune a Vision Language Model to reduce hallucinations using the [openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) dataset. \|
	\| [`examples/scripts/evals/judge_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/evals/judge_tldr.py) \| This script shows how to use [HfPairwiseJudge](/docs/trl/pr_4331/en/judges#trl.HfPairwiseJudge) or [OpenAIPairwiseJudge](/docs/trl/pr_4331/en/judges#trl.OpenAIPairwiseJudge) to judge model generations. \|
	\| [`examples/scripts/gkd.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gkd.py) \| This script shows how to use the [GKDTrainer](/docs/trl/pr_4331/en/gkd_trainer#trl.GKDTrainer) to fine-tune a model. \|
	\| [`trl/scripts/grpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/grpo.py) \| This script shows how to use the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/grpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_vlm.py) \| This script shows how to use the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a multimodal model for reasoning using the [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified) dataset. \|
	\| [`examples/scripts/gspo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py) \| This script shows how to use GSPO via the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune model for reasoning using the [AI-MO/NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. \|
	\| [`examples/scripts/gspo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py) \| This script shows how to use GSPO via the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a multimodal model for reasoning using the [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified) dataset. \|
	\| [`examples/scripts/kto.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/kto.py) \| This script shows how to use the [KTOTrainer](/docs/trl/pr_4331/en/kto_trainer#trl.KTOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/mpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/mpo_vlm.py) \| This script shows how to use MPO via the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to align a model based on preferences using the [HuggingFaceH4/rlaif-v_formatted](https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted) dataset and a set of loss weights with weights. \|
	\| [`examples/scripts/nash_md.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/nash_md.py) \| This script shows how to use the [NashMDTrainer](/docs/trl/pr_4331/en/nash_md_trainer#trl.NashMDTrainer) to fine-tune a model. \|
	\| [`examples/scripts/online_dpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/online_dpo.py) \| This script shows how to use the [OnlineDPOTrainer](/docs/trl/pr_4331/en/online_dpo_trainer#trl.OnlineDPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/online_dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/online_dpo_vlm.py) \| This script shows how to use the [OnlineDPOTrainer](/docs/trl/pr_4331/en/online_dpo_trainer#trl.OnlineDPOTrainer) to fine-tune a a Vision Language Model. \|
	\| [`examples/scripts/orpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py) \| This script shows how to use the [ORPOTrainer](/docs/trl/pr_4331/en/orpo_trainer#trl.ORPOTrainer) to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. \|
	\| [`examples/scripts/ppo/ppo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo.py) \| This script shows how to use the [PPOTrainer](/docs/trl/pr_4331/en/ppo_trainer#trl.PPOTrainer) to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language. \|
	\| [`examples/scripts/ppo/ppo_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo_tldr.py) \| This script shows how to use the [PPOTrainer](/docs/trl/pr_4331/en/ppo_trainer#trl.PPOTrainer) to fine-tune a model to improve its ability to generate TL;DR summaries. \|
	\| [`examples/scripts/prm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/prm.py) \| This script shows how to use the [PRMTrainer](/docs/trl/pr_4331/en/prm_trainer#trl.PRMTrainer) to fine-tune a Process-supervised Reward Model (PRM). \|
	\| [`examples/scripts/reward_modeling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py) \| This script shows how to use the [RewardTrainer](/docs/trl/pr_4331/en/reward_trainer#trl.RewardTrainer) to train an Outcome Reward Model (ORM) on your own dataset. \|
	\| [`examples/scripts/rloo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/rloo.py) \| This script shows how to use the [RLOOTrainer](/docs/trl/pr_4331/en/rloo_trainer#trl.RLOOTrainer) to fine-tune a model to improve its ability to solve math questions. \|
	\| [`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a model. \|
	\| [`examples/scripts/sft_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_gemma3.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Gemma 3 model. \|
	\| [`examples/scripts/sft_video_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_video_llm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Video Language Model. \|
	\| [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. \|
	\| [`examples/scripts/sft_vlm_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_gemma3.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Gemma 3 model on vision to text tasks. \|
	\| [`examples/scripts/sft_vlm_smol_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_smol_vlm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a SmolVLM model. \|
	\| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) \| This script shows how to use the [XPOTrainer](/docs/trl/pr_4331/en/xpo_trainer#trl.XPOTrainer) to fine-tune a model. \|

	## Distributed Training (for scripts)

	You can run scripts on multiple GPUs with 🤗 Accelerate:

	```shell
	accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script
	```

	For DeepSpeed ZeRO-{1,2,3}:

	```shell
	accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script
	```

	Adjust `NUM_GPUS` and `--all_arguments_of_the_script` as needed.


	<EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/example_overview.md" />

Xet Storage Details

Size:: 11.9 kB
Xet hash:: 7488c10a5d4735541ffb52fb5647711e7df361d895b15a14a36a0f54d64d8bf3

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.

	# Examples

	This directory contains a collection of examples that demonstrate how to use the TRL library for various applications. We provide both scripts for advanced use cases and notebooks for an easy start and interactive experimentation.

	The notebooks are self-contained and can run on free Colab, while the scripts can run on single GPU, multi-GPU, or DeepSpeed setups.

	Getting Started

	Install TRL and additional dependencies as follows:

	```bash
	pip install --upgrade trl[quantization]
	```

	Check for additional optional dependencies [here](https://github.com/huggingface/trl/blob/main/pyproject.toml).

	For scripts, you will also need an 🤗 Accelerate config (recommended for multi-gpu settings):

	```bash
	accelerate config # will prompt you to define the training configuration
	```

	This allows you to run scripts with `accelerate launch` in single or multi-GPU settings.

	## Notebooks

	These notebooks are easier to run and are designed for quick experimentation with TRL. The list of notebooks can be found in the [`trl/examples/notebooks/`](https://github.com/huggingface/trl/tree/main/examples/notebooks/) directory.


	\| Notebook \| Description \| Open in Colab \|
	\|----------\|-------------\|---------------\|
	\| [`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb) \| Supervised Fine-Tuning (SFT) using QLoRA on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb) \|
	\| [`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb) \| Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb) \|
	\| [`grpo_qwen3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_qwen3_vl.ipynb) \| GRPO Qwen3-VL with QLoRA using TRL on free Colab \| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb) \|

	## Scripts

	Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl/blob/main/trl/scripts) and [`examples/scripts`](https://github.com/huggingface/trl/blob/main/examples/scripts) directories. They show how to use different trainers such as `SFTTrainer`, `PPOTrainer`, `DPOTrainer`, `GRPOTrainer`, and more.

	File \| Description \|
	\| --- \| --- \|
	\| [`examples/scripts/bco.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/bco.py) \| This script shows how to use the [KTOTrainer](/docs/trl/pr_4331/en/kto_trainer#trl.KTOTrainer) with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. \|
	\| [`examples/scripts/cpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/cpo.py) \| This script shows how to use the [CPOTrainer](/docs/trl/pr_4331/en/cpo_trainer#trl.CPOTrainer) to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. \|
	\| [`trl/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/dpo.py) \| This script shows how to use the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo_vlm.py) \| This script shows how to use the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to fine-tune a Vision Language Model to reduce hallucinations using the [openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) dataset. \|
	\| [`examples/scripts/evals/judge_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/evals/judge_tldr.py) \| This script shows how to use [HfPairwiseJudge](/docs/trl/pr_4331/en/judges#trl.HfPairwiseJudge) or [OpenAIPairwiseJudge](/docs/trl/pr_4331/en/judges#trl.OpenAIPairwiseJudge) to judge model generations. \|
	\| [`examples/scripts/gkd.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gkd.py) \| This script shows how to use the [GKDTrainer](/docs/trl/pr_4331/en/gkd_trainer#trl.GKDTrainer) to fine-tune a model. \|
	\| [`trl/scripts/grpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/grpo.py) \| This script shows how to use the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/grpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_vlm.py) \| This script shows how to use the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a multimodal model for reasoning using the [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified) dataset. \|
	\| [`examples/scripts/gspo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py) \| This script shows how to use GSPO via the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune model for reasoning using the [AI-MO/NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. \|
	\| [`examples/scripts/gspo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py) \| This script shows how to use GSPO via the [GRPOTrainer](/docs/trl/pr_4331/en/grpo_trainer#trl.GRPOTrainer) to fine-tune a multimodal model for reasoning using the [lmms-lab/multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified) dataset. \|
	\| [`examples/scripts/kto.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/kto.py) \| This script shows how to use the [KTOTrainer](/docs/trl/pr_4331/en/kto_trainer#trl.KTOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/mpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/mpo_vlm.py) \| This script shows how to use MPO via the [DPOTrainer](/docs/trl/pr_4331/en/dpo_trainer#trl.DPOTrainer) to align a model based on preferences using the [HuggingFaceH4/rlaif-v_formatted](https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted) dataset and a set of loss weights with weights. \|
	\| [`examples/scripts/nash_md.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/nash_md.py) \| This script shows how to use the [NashMDTrainer](/docs/trl/pr_4331/en/nash_md_trainer#trl.NashMDTrainer) to fine-tune a model. \|
	\| [`examples/scripts/online_dpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/online_dpo.py) \| This script shows how to use the [OnlineDPOTrainer](/docs/trl/pr_4331/en/online_dpo_trainer#trl.OnlineDPOTrainer) to fine-tune a model. \|
	\| [`examples/scripts/online_dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/online_dpo_vlm.py) \| This script shows how to use the [OnlineDPOTrainer](/docs/trl/pr_4331/en/online_dpo_trainer#trl.OnlineDPOTrainer) to fine-tune a a Vision Language Model. \|
	\| [`examples/scripts/orpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py) \| This script shows how to use the [ORPOTrainer](/docs/trl/pr_4331/en/orpo_trainer#trl.ORPOTrainer) to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. \|
	\| [`examples/scripts/ppo/ppo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo.py) \| This script shows how to use the [PPOTrainer](/docs/trl/pr_4331/en/ppo_trainer#trl.PPOTrainer) to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language. \|
	\| [`examples/scripts/ppo/ppo_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo_tldr.py) \| This script shows how to use the [PPOTrainer](/docs/trl/pr_4331/en/ppo_trainer#trl.PPOTrainer) to fine-tune a model to improve its ability to generate TL;DR summaries. \|
	\| [`examples/scripts/prm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/prm.py) \| This script shows how to use the [PRMTrainer](/docs/trl/pr_4331/en/prm_trainer#trl.PRMTrainer) to fine-tune a Process-supervised Reward Model (PRM). \|
	\| [`examples/scripts/reward_modeling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py) \| This script shows how to use the [RewardTrainer](/docs/trl/pr_4331/en/reward_trainer#trl.RewardTrainer) to train an Outcome Reward Model (ORM) on your own dataset. \|
	\| [`examples/scripts/rloo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/rloo.py) \| This script shows how to use the [RLOOTrainer](/docs/trl/pr_4331/en/rloo_trainer#trl.RLOOTrainer) to fine-tune a model to improve its ability to solve math questions. \|
	\| [`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a model. \|
	\| [`examples/scripts/sft_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_gemma3.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Gemma 3 model. \|
	\| [`examples/scripts/sft_video_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_video_llm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Video Language Model. \|
	\| [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. \|
	\| [`examples/scripts/sft_vlm_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_gemma3.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a Gemma 3 model on vision to text tasks. \|
	\| [`examples/scripts/sft_vlm_smol_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_smol_vlm.py) \| This script shows how to use the [SFTTrainer](/docs/trl/pr_4331/en/sft_trainer#trl.SFTTrainer) to fine-tune a SmolVLM model. \|
	\| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) \| This script shows how to use the [XPOTrainer](/docs/trl/pr_4331/en/xpo_trainer#trl.XPOTrainer) to fine-tune a model. \|

	## Distributed Training (for scripts)

	You can run scripts on multiple GPUs with 🤗 Accelerate:

	```shell
	accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script
	```

	For DeepSpeed ZeRO-{1,2,3}:

	```shell
	accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script
	```

	Adjust `NUM_GPUS` and `--all_arguments_of_the_script` as needed.


	<EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/example_overview.md" />