Community Tutorials

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models

Tutorials

Task	Class	Description	Author	Tutorial
Reinforcement Learning	[`GRPOTrainer`]	Efficient Online Training with GRPO and vLLM in TRL	Sergio Paniego	Link
Reinforcement Learning	[`GRPOTrainer`]	Post training an LLM for reasoning with GRPO in TRL	Sergio Paniego	Link
Reinforcement Learning	[`GRPOTrainer`]	Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial	Philipp Schmid	Link
Reinforcement Learning	[`GRPOTrainer`]	RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations	Andrea Manzoni	Link
Instruction tuning	[`SFTTrainer`]	Fine-tuning Google Gemma LLMs using ChatML format with QLoRA	Philipp Schmid	Link
Structured Generation	[`SFTTrainer`]	Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT	Mohammadreza Esmaeilian	Link
Preference Optimization	[`DPOTrainer`]	Align Mistral-7b using Direct Preference Optimization for human preference alignment	Maxime Labonne	Link
Preference Optimization	[`experimental.orpo.ORPOTrainer`]	Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment	Maxime Labonne	Link
Instruction tuning	[`SFTTrainer`]	How to fine-tune open LLMs in 2025 with Hugging Face	Philipp Schmid	Link
Step-Level Reasoning	[`GRPOTrainer`]	Supervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLM	Deepak Swaminathan	Link

Videos

Task	Title	Author	Video
Instruction tuning	Fine-tuning open AI models using Hugging Face TRL	Wietse Venema
Instruction tuning	How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset	Mayurji

⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)

The tutorial uses two deprecated features:

SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).

setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.

Vision Language Models

Tutorials

Task	Class	Description	Author	Tutorial
Visual QA	[`SFTTrainer`]	Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset	Sergio Paniego	Link
Visual QA	[`SFTTrainer`]	Fine-tuning SmolVLM with TRL on a consumer GPU	Sergio Paniego	Link
SEO Description	[`SFTTrainer`]	Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images	Philipp Schmid	Link
Visual QA	[`DPOTrainer`]	PaliGemma 🤝 Direct Preference Optimization	Merve Noyan	Link
Visual QA	[`DPOTrainer`]	Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU	Sergio Paniego	Link
Object Detection Grounding	[`SFTTrainer`]	Fine tuning a VLM for Object Detection Grounding using TRL	Sergio Paniego	Link
Visual QA	[`DPOTrainer`]	Fine-Tuning a Vision Language Model with TRL using MPO	Sergio Paniego	Link
Reinforcement Learning	[`GRPOTrainer`]	Post training a VLM for reasoning with GRPO using TRL	Sergio Paniego	Link

Speech Language Models

Tutorials

Task	Class	Description	Author	Tutorial
Text-to-Speech	[`GRPOTrainer`]	Post training a Speech Language Model with GRPO using TRL	Steven Zheng	Link

Contributing

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.