trl-mcsd / docs /source /community_tutorials.md
ihbkaiser's picture
Implement MCSD for experimental SDPO
1fa3c6c verified

Community Tutorials

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models

Tutorials

Task Class Description Author Tutorial Colab
Reinforcement Learning [GRPOTrainer] Efficient Online Training with GRPO and vLLM in TRL Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Post training an LLM for reasoning with GRPO in TRL Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial Philipp Schmid Link Open In Colab
Reinforcement Learning [GRPOTrainer] RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations Andrea Manzoni Link Open In Colab
Instruction tuning [SFTTrainer] Fine-tuning Google Gemma LLMs using ChatML format with QLoRA Philipp Schmid Link Open In Colab
Structured Generation [SFTTrainer] Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT Mohammadreza Esmaeilian Link Open In Colab
Preference Optimization [DPOTrainer] Align Mistral-7b using Direct Preference Optimization for human preference alignment Maxime Labonne Link Open In Colab
Preference Optimization [experimental.orpo.ORPOTrainer] Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment Maxime Labonne Link Open In Colab
Instruction tuning [SFTTrainer] How to fine-tune open LLMs in 2025 with Hugging Face Philipp Schmid Link Open In Colab
Step-Level Reasoning [GRPOTrainer] Supervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLM Deepak Swaminathan Link Open In Colab

Videos

Task Title Author Video
Instruction tuning Fine-tuning open AI models using Hugging Face TRL Wietse Venema
Instruction tuning How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset Mayurji
⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)

The tutorial uses two deprecated features:

  • SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).
  • setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.

Vision Language Models

Tutorials

Task Class Description Author Tutorial Colab
Visual QA [SFTTrainer] Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset Sergio Paniego Link Open In Colab
Visual QA [SFTTrainer] Fine-tuning SmolVLM with TRL on a consumer GPU Sergio Paniego Link Open In Colab
SEO Description [SFTTrainer] Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images Philipp Schmid Link Open In Colab
Visual QA [DPOTrainer] PaliGemma 🤝 Direct Preference Optimization Merve Noyan Link Open In Colab
Visual QA [DPOTrainer] Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU Sergio Paniego Link Open In Colab
Object Detection Grounding [SFTTrainer] Fine tuning a VLM for Object Detection Grounding using TRL Sergio Paniego Link Open In Colab
Visual QA [DPOTrainer] Fine-Tuning a Vision Language Model with TRL using MPO Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Post training a VLM for reasoning with GRPO using TRL Sergio Paniego Link Open In Colab

Speech Language Models

Tutorials

Task Class Description Author Tutorial
Text-to-Speech [GRPOTrainer] Post training a Speech Language Model with GRPO using TRL Steven Zheng Link

Contributing

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.