a smol course

university

https://github.com/huggingface/smol-course

Activity Feed

AI & ML interests

The smollest course on post training

Recent Activity

burtenshaw updated a dataset 1 day ago

smol-course/certificates

burtenshaw updated a dataset 5 days ago

smol-course/certificates

burtenshaw authored a paper 8 months ago

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

View all activity

burtenshaw

updated a dataset 1 day ago

smol-course/certificates

Viewer • Updated 1 day ago • 369 • 148 • 4

sergiopaniego

posted an update 9 days ago

Post

1727

OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html

sergiopaniego

posted an update 10 days ago

Post

787

OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces

Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ correct answer, reward 1.0, in 4.2s

Run GRPO (TRL) → model learns to write that search strategy itself

test it yourself → sergiopaniego/repl-env
check out OpenEnv → https://github.com/meta-pytorch/OpenEnv

sergiopaniego

posted an update about 1 month ago

Post

1371

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer

sergiopaniego

posted an update about 1 month ago

Post

465

Great experience yesterday at PyTorch Conf Europe in Paris 🇫🇷

We (w/ @kashif ) talked about training LLMs through interaction, using trajectories across games, browsers, or simulators

Room was packed, a clear sign of interest in where RL post-training is heading.

sharing the slides! 🤓
https://drive.google.com/file/d/16k7YRnf5EJEo0XjXGlRJ_hVeLoFWKyNP/view?usp=sharing

sergiopaniego

posted an update about 2 months ago

Post

2869

Gemma 4 💎 is here and it’s strong!

to celebrate, we’re rolling out in TRL:

> support for multimodal tool responses for environments (OpenEnv)
> an example to train it in CARLA for autonomous driving with image-based tool calls

go check it out 🏎️🏎️

blog: https://huggingface.co/blog/gemma4
script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla_vlm_gemma.py

sergiopaniego

posted an update about 2 months ago

Post

2086

TRL is officially an adult 🥳

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1

2 replies

sergiopaniego

posted an update 2 months ago

Post

830

ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4×H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp

sergiopaniego

posted an update 2 months ago

Post

490

We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape

sergiopaniego

posted an update 2 months ago

Post

453

Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)

Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!

- Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb
- Script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py
- Collection with all the models: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3

sergiopaniego

posted an update 3 months ago

Post

672

did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? → just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl

sergiopaniego

posted an update 3 months ago

Post

524

Qwen3.5 dense (smol 🤏) models just dropped

- natively multimodal
- 0.8B · 2B · 4B · 9B (+ base variants)
- 262K context extensible to 1M
- built-in thinking

fine-tune them with TRL out of the box → SFT, GRPO, DPO and more!

examples: https://huggingface.co/docs/trl/example_overview
collection: https://huggingface.co/collections/Qwen/qwen35

sergiopaniego

posted an update 3 months ago

Post

2528

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

sergiopaniego

posted an update 3 months ago

Post

1553

Tiny Aya 🌿 just dropped from @CohereLabs , a really powerful multilingual small model!

To celebrate, we cooked up fresh resources to train it for tool calling 🔧

> Free Google Colab guide: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
> Standalone training script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py

sergiopaniego

posted an update 3 months ago

Post

636

The latest piece by @MiniMax-AI is a must-read.

It tries to break the impossible triangle of agent RL: throughput × stability × flexibility.

A lot to learn here, go read it 🫵
https://huggingface.co/blog/MiniMax-AI/forge-scalable-agent-rl-framework-and-algorithm

sergiopaniego

posted an update 3 months ago

Post

552

if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs

we have a broad list to cover!! 🧐

https://github.com/huggingface/trl/issues/4407

sergiopaniego

posted an update 4 months ago

Post

579

Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally (@microsoft ):

🔍 Detects training issues early
🛠 Lets you intervene safely
📊 Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.co/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit

sergiopaniego

posted an update 4 months ago

Post

2651

New TRL + OpenEnv example! 💥

Fine tune an LLM for playing Sudoku using an RL env via OpenEnv

Includes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.

Enjoy!

Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb

Script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py

1 reply

sergiopaniego

posted an update 4 months ago

Post

2282

Date idea: read the entire Transformers v5.0.0 release notes

Officially stable now: https://github.com/huggingface/transformers/releases/tag/v5.0.0

1 reply

sergiopaniego

posted an update 4 months ago

Post

1701

FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma

1 reply

AI & ML interests

Recent Activity

Team members 2

smol-course's activity