Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4624 /en /quickstart.md

rtrm

about 2 months ago

preview code

download

raw

4.02 kB

Quickstart

Get started with TRL in minutes. This guide shows you the essentials for training models with SFT, GRPO, and DPO.

💡 Looking for ready-to-run examples? Check out our notebooks for Colab or production scripts.

Quick Examples

Copy-paste these minimal examples to start training immediately. Each uses compact models for quick experimentation.

Supervised Fine-Tuning

from trl import SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()

Group Relative Policy Optimization

from trl import GRPOTrainer
from datasets import load_dataset
from trl.rewards import accuracy_reward

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",  # Start from SFT model
    train_dataset=load_dataset("trl-lib/DeepMath-103K", split="train"),
    reward_funcs=accuracy_reward,
)
trainer.train()

Direct Preference Optimization

from trl import DPOTrainer
from datasets import load_dataset

trainer = DPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use your SFT model
    ref_model="Qwen/Qwen2.5-0.5B-Instruct",  # Original base model
    train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"),
)
trainer.train()

Reward Modeling

from trl import RewardTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    train_dataset=dataset,
)
trainer.train()

Command Line Interface

Skip the code entirely - train directly from your terminal:

# SFT: Fine-tune on instructions
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara

# DPO: Align with preferences  
trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized

# Reward: Train a reward model
trl reward --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized

Ready-to-Run Examples

Want to dive deeper? We provide a comprehensive collection of examples for all skill levels:

Notebooks (Beginner-friendly)
Self-contained notebooks for interactive learning. Many run on free Google Colab, while some require larger GPUs.

  SFT with QLoRA (free Colab ✓)
  GRPO with QLoRA (free Colab ✓)
  GRPO for Vision-Language Models (free Colab ✓)

→ See all notebooks


Scripts (Production-ready)
Full-featured scripts for single GPU, multi-GPU, and DeepSpeed setups. Ready for real-world training.

  SFT Script
  GRPO Script
  DPO Script

→ See all scripts

What's Next?

📚 Learn More

SFT Trainer - Complete SFT guide
DPO Trainer - Preference alignment
GRPO Trainer - Group relative policy optimization

🚀 Scale Up

Distributed Training - Multi-GPU setups
Memory Optimization - Efficient training
PEFT Integration - LoRA and QLoRA

🌐 Community

Community Tutorials - External guides and resources

Troubleshooting

Out of Memory?

Reduce batch size and enable optimizations:

training_args = SFTConfig(
    per_device_train_batch_size=1,  # Start small
    gradient_accumulation_steps=8,  # Maintain effective batch size
)

training_args = DPOConfig(
    per_device_train_batch_size=1,  # Start small
    gradient_accumulation_steps=8,  # Maintain effective batch size
)

Loss not decreasing?

Try adjusting the learning rate:

training_args = SFTConfig(learning_rate=2e-5)  # Good starting point

For more help, open an issue on GitHub.

Xet Storage Details

Size:: 4.02 kB
Xet hash:: 4c70d6186ab8bb9db2a564a1a3389fa898fac8b810df295d660cec1bd74360cc

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.