Buckets:

hf-doc-build/doc-dev / trl /pr_4624 /en /quickstart.md
rtrm's picture
|
download
raw
4.02 kB

Quickstart

Get started with TRL in minutes. This guide shows you the essentials for training models with SFT, GRPO, and DPO.

๐Ÿ’ก Looking for ready-to-run examples? Check out our notebooks for Colab or production scripts.

Quick Examples

Copy-paste these minimal examples to start training immediately. Each uses compact models for quick experimentation.

Supervised Fine-Tuning

from trl import SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()

Group Relative Policy Optimization

from trl import GRPOTrainer
from datasets import load_dataset
from trl.rewards import accuracy_reward

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",  # Start from SFT model
    train_dataset=load_dataset("trl-lib/DeepMath-103K", split="train"),
    reward_funcs=accuracy_reward,
)
trainer.train()

Direct Preference Optimization

from trl import DPOTrainer
from datasets import load_dataset

trainer = DPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use your SFT model
    ref_model="Qwen/Qwen2.5-0.5B-Instruct",  # Original base model
    train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"),
)
trainer.train()

Reward Modeling

from trl import RewardTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    train_dataset=dataset,
)
trainer.train()

Command Line Interface

Skip the code entirely - train directly from your terminal:

# SFT: Fine-tune on instructions
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara

# DPO: Align with preferences  
trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized

# Reward: Train a reward model
trl reward --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized

Ready-to-Run Examples

Want to dive deeper? We provide a comprehensive collection of examples for all skill levels:

Notebooks (Beginner-friendly)
Self-contained notebooks for interactive learning. Many run on free Google Colab, while some require larger GPUs.

  SFT with QLoRA (free Colab โœ“)
  GRPO with QLoRA (free Colab โœ“)
  GRPO for Vision-Language Models (free Colab โœ“)

โ†’ See all notebooks


Scripts (Production-ready)
Full-featured scripts for single GPU, multi-GPU, and DeepSpeed setups. Ready for real-world training.

  SFT Script
  GRPO Script
  DPO Script

โ†’ See all scripts

What's Next?

๐Ÿ“š Learn More

๐Ÿš€ Scale Up

๐ŸŒ Community

Troubleshooting

Out of Memory?

Reduce batch size and enable optimizations:

training_args = SFTConfig(
    per_device_train_batch_size=1,  # Start small
    gradient_accumulation_steps=8,  # Maintain effective batch size
)
training_args = DPOConfig(
    per_device_train_batch_size=1,  # Start small
    gradient_accumulation_steps=8,  # Maintain effective batch size
)

Loss not decreasing?

Try adjusting the learning rate:

training_args = SFTConfig(learning_rate=2e-5)  # Good starting point

For more help, open an issue on GitHub.

Xet Storage Details

Size:
4.02 kB
ยท
Xet hash:
4c70d6186ab8bb9db2a564a1a3389fa898fac8b810df295d660cec1bd74360cc

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.