Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_4624 /en /quickstart.md

rtrm

about 2 months ago

preview code

download

raw

4.02 kB

	# Quickstart

	Get started with TRL in minutes. This guide shows you the essentials for training models with SFT, GRPO, and DPO.

	> 💡 Looking for ready-to-run examples? Check out our [notebooks for Colab](#ready-to-run-examples) or [production scripts](example_overview#scripts).

	## Quick Examples

	Copy-paste these minimal examples to start training immediately. Each uses compact models for quick experimentation.

	### Supervised Fine-Tuning

	```python
	from trl import SFTTrainer
	from datasets import load_dataset

	trainer = SFTTrainer(
	model="Qwen/Qwen2.5-0.5B",
	train_dataset=load_dataset("trl-lib/Capybara", split="train"),
	)
	trainer.train()
	```

	### Group Relative Policy Optimization

	```python
	from trl import GRPOTrainer
	from datasets import load_dataset
	from trl.rewards import accuracy_reward

	trainer = GRPOTrainer(
	model="Qwen/Qwen2.5-0.5B-Instruct", # Start from SFT model
	train_dataset=load_dataset("trl-lib/DeepMath-103K", split="train"),
	reward_funcs=accuracy_reward,
	)
	trainer.train()
	```

	### Direct Preference Optimization

	```python
	from trl import DPOTrainer
	from datasets import load_dataset

	trainer = DPOTrainer(
	model="Qwen/Qwen2.5-0.5B-Instruct", # Use your SFT model
	ref_model="Qwen/Qwen2.5-0.5B-Instruct", # Original base model
	train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"),
	)
	trainer.train()
	```

	### Reward Modeling

	```python
	from trl import RewardTrainer
	from datasets import load_dataset

	dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

	trainer = RewardTrainer(
	model="Qwen/Qwen2.5-0.5B-Instruct",
	train_dataset=dataset,
	)
	trainer.train()
	```

	## Command Line Interface

	Skip the code entirely - train directly from your terminal:

	```bash
	# SFT: Fine-tune on instructions
	trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
	--dataset_name trl-lib/Capybara

	# DPO: Align with preferences
	trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
	--dataset_name trl-lib/ultrafeedback_binarized

	# Reward: Train a reward model
	trl reward --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
	--dataset_name trl-lib/ultrafeedback_binarized
	```

	## Ready-to-Run Examples

	Want to dive deeper? We provide a comprehensive collection of examples for all skill levels:


	Notebooks (Beginner-friendly)
	Self-contained notebooks for interactive learning. Many run on free Google Colab, while some require larger GPUs.

	SFT with QLoRA (free Colab ✓)
	GRPO with QLoRA (free Colab ✓)
	GRPO for Vision-Language Models (free Colab ✓)

	→ See all notebooks


	Scripts (Production-ready)
	Full-featured scripts for single GPU, multi-GPU, and DeepSpeed setups. Ready for real-world training.

	SFT Script
	GRPO Script
	DPO Script

	→ See all scripts


	## What's Next?

	### 📚 Learn More

	- [SFT Trainer](sft_trainer) - Complete SFT guide
	- [DPO Trainer](dpo_trainer) - Preference alignment
	- [GRPO Trainer](grpo_trainer) - Group relative policy optimization

	### 🚀 Scale Up

	- [Distributed Training](distributing_training) - Multi-GPU setups
	- [Memory Optimization](reducing_memory_usage) - Efficient training
	- [PEFT Integration](peft_integration) - LoRA and QLoRA

	### 🌐 Community

	- [Community Tutorials](community_tutorials) - External guides and resources

	## Troubleshooting

	### Out of Memory?

	Reduce batch size and enable optimizations:

	```python
	training_args = SFTConfig(
	per_device_train_batch_size=1, # Start small
	gradient_accumulation_steps=8, # Maintain effective batch size
	)
	```

	```python
	training_args = DPOConfig(
	per_device_train_batch_size=1, # Start small
	gradient_accumulation_steps=8, # Maintain effective batch size
	)
	```

	### Loss not decreasing?

	Try adjusting the learning rate:

	```python
	training_args = SFTConfig(learning_rate=2e-5) # Good starting point
	```

	For more help, open an [issue on GitHub](https://github.com/huggingface/trl/issues).

Xet Storage Details

Size:: 4.02 kB
Xet hash:: 4c70d6186ab8bb9db2a564a1a3389fa898fac8b810df295d660cec1bd74360cc

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.