Buckets:
| # Community Tutorials | |
| Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities. | |
| ## Language Models | |
| ### Tutorials | |
| | Task | Class | Description | Author | Tutorial | Colab | | |
| | --- | --- | --- | --- | --- | --- | | |
| | Reinforcement Learning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Efficient Online Training with GRPO and vLLM in TRL | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/grpo_vllm_online_training) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/grpo_vllm_online_training.ipynb) | | |
| | Reinforcement Learning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Post training an LLM for reasoning with GRPO in TRL | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_grpo_trl.ipynb) | | |
| | Reinforcement Learning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/mini-deepseek-r1) | [](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb) | | |
| | Reinforcement Learning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations | [Andrea Manzoni](https://huggingface.co/AManzoni) | [Link](https://colab.research.google.com/github/amanzoni1/fine_tuning/blob/main/RL_LLama3_1_8B_GRPO.ipynb) | [](https://colab.research.google.com/github/amanzoni1/fine_tuning/blob/main/RL_LLama3_1_8B_GRPO.ipynb) | | |
| | Instruction tuning | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine-tuning Google Gemma LLMs using ChatML format with QLoRA | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/fine-tune-google-gemma) | [](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb) | | |
| | Structured Generation | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT | [Mohammadreza Esmaeilian](https://huggingface.co/Mohammadreza) | [Link](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format.ipynb) | | |
| | Preference Optimization | [DPOTrainer](/docs/trl/pr_5607/en/bema_for_reference_model#trl.DPOTrainer) | Align Mistral-7b using Direct Preference Optimization for human preference alignment | [Maxime Labonne](https://huggingface.co/mlabonne) | [Link](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html) | [](https://colab.research.google.com/github/mlabonne/llm-course/blob/main/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb) | | |
| | Preference Optimization | [experimental.orpo.ORPOTrainer](/docs/trl/pr_5607/en/orpo_trainer#trl.experimental.orpo.ORPOTrainer) | Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment | [Maxime Labonne](https://huggingface.co/mlabonne) | [Link](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) | [](https://colab.research.google.com/drive/1eHNWg9gnaXErdAa8_mcvjMupbSS6rDvi) | | |
| | Instruction tuning | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | How to fine-tune open LLMs in 2025 with Hugging Face | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/fine-tune-llms-in-2025) | [](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2025.ipynb) | | |
| | Step-Level Reasoning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Supervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLM | [Deepak Swaminathan](https://huggingface.co/s23deepak) | [Link](https://github.com/s23deepak/Supervised-Reinforcement-Learning) | [](https://colab.research.google.com/github/s23deepak/Supervised-Reinforcement-Learning/blob/main/notebooks/srl_grpo_tutorial.ipynb) | | |
| ### Videos | |
| | Task | Title | Author | Video | | |
| | --- | --- | --- | --- | | |
| | Instruction tuning | Fine-tuning open AI models using Hugging Face TRL | [Wietse Venema](https://huggingface.co/wietsevenema) | [](https://youtu.be/cnGyyM0vOes) | | |
| | Instruction tuning | How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset | [Mayurji](https://huggingface.co/iammayur) | [](https://youtu.be/jKdXv3BiLu0) | | |
| ⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand) | |
| > [!WARNING] | |
| > The tutorial uses two deprecated features: | |
| > | |
| > - `SFTTrainer(..., tokenizer=tokenizer)`: Use `SFTTrainer(..., processing_class=tokenizer)` instead, or simply omit it (it will be inferred from the model). | |
| > - `setup_chat_format(model, tokenizer)`: Use `SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B")`, where `chat_template_path` specifies the model whose chat template you want to copy. | |
| ## Vision Language Models | |
| ### Tutorials | |
| | Task | Class | Description | Author | Tutorial | Colab | | |
| | --- | --- | --- | --- | --- | --- | | |
| | Visual QA | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_trl) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_trl.ipynb) | | |
| | Visual QA | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine-tuning SmolVLM with TRL on a consumer GPU | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_smol_vlm_sft_trl) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_smol_vlm_sft_trl.ipynb) | | |
| | SEO Description | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images | [Philipp Schmid](https://huggingface.co/philschmid) | [Link](https://www.philschmid.de/fine-tune-multimodal-llms-with-trl) | [](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-multimodal-llms-with-trl.ipynb) | | |
| | Visual QA | [DPOTrainer](/docs/trl/pr_5607/en/bema_for_reference_model#trl.DPOTrainer) | PaliGemma 🤝 Direct Preference Optimization | [Merve Noyan](https://huggingface.co/merve) | [Link](https://github.com/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb) | [](https://colab.research.google.com/github/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb) | | |
| | Visual QA | [DPOTrainer](/docs/trl/pr_5607/en/bema_for_reference_model#trl.DPOTrainer) | Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_dpo_smolvlm_instruct) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_dpo_smolvlm_instruct.ipynb) | | |
| | Object Detection Grounding | [SFTTrainer](/docs/trl/pr_5607/en/sft_trainer#trl.SFTTrainer) | Fine tuning a VLM for Object Detection Grounding using TRL | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_object_detection_grounding.ipynb) | | |
| | Visual QA | [DPOTrainer](/docs/trl/pr_5607/en/bema_for_reference_model#trl.DPOTrainer) | Fine-Tuning a Vision Language Model with TRL using MPO | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_mpo.ipynb) | | |
| | Reinforcement Learning | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Post training a VLM for reasoning with GRPO using TRL | [Sergio Paniego](https://huggingface.co/sergiopaniego) | [Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl) | [](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_grpo_trl.ipynb) | | |
| ## Speech Language Models | |
| ### Tutorials | |
| | Task | Class | Description | Author | Tutorial | | |
| | --- | --- | --- | --- | --- | | |
| | Text-to-Speech | [GRPOTrainer](/docs/trl/pr_5607/en/gspo_token#trl.GRPOTrainer) | Post training a Speech Language Model with GRPO using TRL | [Steven Zheng](https://huggingface.co/Steveeeeeeen) | [Link](https://huggingface.co/blog/Steveeeeeeen/llasa-grpo) | | |
| ## Contributing | |
| If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community. | |
Xet Storage Details
- Size:
- 11 kB
- Xet hash:
- c529bf02292a95583c1f9a64ae7448053f053f9cf33db4dfb62b599e9540de69
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.