Buckets:
| # Quickstart | |
| Get started with TRL in minutes. This guide shows you the essentials for training models with SFT, GRPO, and DPO. | |
| > ๐ก **Looking for ready-to-run examples?** Check out our [notebooks for Colab](#ready-to-run-examples) or [production scripts](example_overview#scripts). | |
| ## Quick Examples | |
| Copy-paste these minimal examples to start training immediately. Each uses compact models for quick experimentation. | |
| ### Supervised Fine-Tuning | |
| ```python | |
| from trl import SFTTrainer | |
| from datasets import load_dataset | |
| trainer = SFTTrainer( | |
| model="Qwen/Qwen2.5-0.5B", | |
| train_dataset=load_dataset("trl-lib/Capybara", split="train"), | |
| ) | |
| trainer.train() | |
| ``` | |
| ### Group Relative Policy Optimization | |
| ```python | |
| from trl import GRPOTrainer | |
| from datasets import load_dataset | |
| from trl.rewards import accuracy_reward | |
| trainer = GRPOTrainer( | |
| model="Qwen/Qwen2.5-0.5B-Instruct", # Start from SFT model | |
| train_dataset=load_dataset("trl-lib/DeepMath-103K", split="train"), | |
| reward_funcs=accuracy_reward, | |
| ) | |
| trainer.train() | |
| ``` | |
| ### Direct Preference Optimization | |
| ```python | |
| from trl import DPOTrainer | |
| from datasets import load_dataset | |
| trainer = DPOTrainer( | |
| model="Qwen/Qwen2.5-0.5B-Instruct", # Use your SFT model | |
| ref_model="Qwen/Qwen2.5-0.5B-Instruct", # Original base model | |
| train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"), | |
| ) | |
| trainer.train() | |
| ``` | |
| ### Reward Modeling | |
| ```python | |
| from trl import RewardTrainer | |
| from datasets import load_dataset | |
| dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train") | |
| trainer = RewardTrainer( | |
| model="Qwen/Qwen2.5-0.5B-Instruct", | |
| train_dataset=dataset, | |
| ) | |
| trainer.train() | |
| ``` | |
| ## Command Line Interface | |
| Skip the code entirely - train directly from your terminal: | |
| ```bash | |
| # SFT: Fine-tune on instructions | |
| trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \ | |
| --dataset_name trl-lib/Capybara | |
| # DPO: Align with preferences | |
| trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \ | |
| --dataset_name trl-lib/ultrafeedback_binarized | |
| # Reward: Train a reward model | |
| trl reward --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \ | |
| --dataset_name trl-lib/ultrafeedback_binarized | |
| ``` | |
| ## Ready-to-Run Examples | |
| Want to dive deeper? We provide a comprehensive collection of examples for all skill levels: | |
| Notebooks (Beginner-friendly) | |
| Self-contained notebooks for interactive learning. Many run on free Google Colab, while some require larger GPUs. | |
| SFT with QLoRA (free Colab โ) | |
| GRPO with QLoRA (free Colab โ) | |
| GRPO for Vision-Language Models (free Colab โ) | |
| โ See all notebooks | |
| Scripts (Production-ready) | |
| Full-featured scripts for single GPU, multi-GPU, and DeepSpeed setups. Ready for real-world training. | |
| SFT Script | |
| GRPO Script | |
| DPO Script | |
| โ See all scripts | |
| ## What's Next? | |
| ### ๐ Learn More | |
| - [SFT Trainer](sft_trainer) - Complete SFT guide | |
| - [DPO Trainer](dpo_trainer) - Preference alignment | |
| - [GRPO Trainer](grpo_trainer) - Group relative policy optimization | |
| ### ๐ Scale Up | |
| - [Distributed Training](distributing_training) - Multi-GPU setups | |
| - [Memory Optimization](reducing_memory_usage) - Efficient training | |
| - [PEFT Integration](peft_integration) - LoRA and QLoRA | |
| ### ๐ Community | |
| - [Community Tutorials](community_tutorials) - External guides and resources | |
| ## Troubleshooting | |
| ### Out of Memory? | |
| Reduce batch size and enable optimizations: | |
| ```python | |
| training_args = SFTConfig( | |
| per_device_train_batch_size=1, # Start small | |
| gradient_accumulation_steps=8, # Maintain effective batch size | |
| ) | |
| ``` | |
| ```python | |
| training_args = DPOConfig( | |
| per_device_train_batch_size=1, # Start small | |
| gradient_accumulation_steps=8, # Maintain effective batch size | |
| ) | |
| ``` | |
| ### Loss not decreasing? | |
| Try adjusting the learning rate: | |
| ```python | |
| training_args = SFTConfig(learning_rate=2e-5) # Good starting point | |
| ``` | |
| For more help, open an [issue on GitHub](https://github.com/huggingface/trl/issues). | |
Xet Storage Details
- Size:
- 4.02 kB
- Xet hash:
- 4c70d6186ab8bb9db2a564a1a3389fa898fac8b810df295d660cec1bd74360cc
ยท
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.