File size: 3,491 Bytes
1fa3c6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# Quickstart

TRL is a comprehensive library for post-training foundation models using techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO).

## Quick Examples

Get started instantly with TRL's most popular trainers. Each example uses compact models for quick experimentation.

### Supervised Fine-Tuning

```python

from trl import SFTTrainer

from datasets import load_dataset



trainer = SFTTrainer(

    model="Qwen/Qwen2.5-0.5B",

    train_dataset=load_dataset("trl-lib/Capybara", split="train"),

)

trainer.train()

```

### Group Relative Policy Optimization

```python

from trl import GRPOTrainer

from datasets import load_dataset

from trl.rewards import accuracy_reward



trainer = GRPOTrainer(

    model="Qwen/Qwen2.5-0.5B-Instruct",

    train_dataset=load_dataset("trl-lib/DeepMath-103K", split="train"),

    reward_funcs=accuracy_reward,

)

trainer.train()

```

### Direct Preference Optimization

```python

from trl import DPOTrainer

from datasets import load_dataset



trainer = DPOTrainer(

    model="Qwen/Qwen2.5-0.5B-Instruct",

    train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"),

)

trainer.train()

```

### Reward Modeling

```python

from trl import RewardTrainer

from datasets import load_dataset



dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")



trainer = RewardTrainer(

    model="Qwen/Qwen2.5-0.5B-Instruct",

    train_dataset=dataset,

)

trainer.train()

```

## Command Line Interface

Skip the code entirely - train directly from your terminal:

```bash

# SFT: Fine-tune on instructions

trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \

    --dataset_name trl-lib/Capybara



# DPO: Align with preferences  

trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \

    --dataset_name trl-lib/ultrafeedback_binarized



# Reward: Train a reward model

trl reward --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \

    --dataset_name trl-lib/ultrafeedback_binarized

```

## What's Next?

### 📚 Learn More

- [SFT Trainer](sft_trainer) - Complete SFT guide
- [DPO Trainer](dpo_trainer) - Preference alignment
- [GRPO Trainer](grpo_trainer) - Group relative policy optimization

### 🚀 Scale Up

- [Distributed Training](distributing_training) - Multi-GPU setups
- [Memory Optimization](reducing_memory_usage) - Efficient training
- [PEFT Integration](peft_integration) - LoRA and QLoRA

### 💡 Examples

- [Example Scripts](https://github.com/huggingface/trl/tree/main/examples) - Production-ready code
- [Community Tutorials](community_tutorials) - External guides

## Troubleshooting

### Out of Memory?

Reduce batch size and enable optimizations:

<hfoptions id="batch_size">
<hfoption id="SFT">

```python

training_args = SFTConfig(

    per_device_train_batch_size=1,  # Start small

    gradient_accumulation_steps=8,  # Maintain effective batch size

)

```

</hfoption>
<hfoption id="DPO">

```python

training_args = DPOConfig(

    per_device_train_batch_size=1,  # Start small

    gradient_accumulation_steps=8,  # Maintain effective batch size

)

```

</hfoption>
</hfoptions>

### Loss not decreasing?

Try adjusting the learning rate:

```python

training_args = SFTConfig(learning_rate=2e-5)  # Good starting point

```

For more help, open an [issue on GitHub](https://github.com/huggingface/trl/issues).