Activity Feed

AI & ML interests

None defined yet.

sergiopaniegoΒ 
posted an update about 20 hours ago
view post
Post
86
ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4Γ—H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ‡️
https://huggingface.co/blog/ulysses-sp
sergiopaniegoΒ 
posted an update 2 days ago
view post
Post
115
We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape
sergiopaniegoΒ 
posted an update 3 days ago
view post
Post
182
Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)

Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!

- Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb
- Script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py
- Collection with all the models: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3
sergiopaniegoΒ 
posted an update 11 days ago
view post
Post
469
did you know you can train agentic models with RL deploying the environments on HF Spaces? πŸ€—

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? β†’ just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results β†’ https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl
sergiopaniegoΒ 
posted an update 12 days ago
sergiopaniegoΒ 
posted an update 16 days ago
view post
Post
2336
What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py
sergiopaniegoΒ 
posted an update 24 days ago
sergiopaniegoΒ 
posted an update 29 days ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
497
if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs

we have a broad list to cover!! 🧐

https://github.com/huggingface/trl/issues/4407
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
533
Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally ( @microsoft ):

πŸ” Detects training issues early
πŸ›  Lets you intervene safely
πŸ“Š Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.co/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit
sergiopaniegoΒ 
posted an update about 1 month ago
sergiopaniegoΒ 
posted an update about 2 months ago
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
1658
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
885
TRL v0.27.0 is out!! πŸ₯³

It includes GDPO, the latest variant of GRPO for multi-reward RL ✨
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence β€” developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
3067
New REPL environment in OpenEnv available! ✨
Used in the Recursive Language Models (RLM) paper by Alex Zhang.

Ready for inference & post-training using trajectories. Handles long contexts:

> Run Python code in a sandbox
> Make recursive calls to LMs
> Explore data programmatically
> Return final result

Docs: https://meta-pytorch.org/OpenEnv/environments/repl/
Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py
sergiopaniegoΒ 
posted an update 2 months ago
view post
Post
632
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!

⚠️ LLMs struggle with long prompts β†’ attention overload & lost info
πŸ”„ RLMs inspect, split & call themselves on chunks, then aggregate results
βœ… Handles millions of tokens, reduces noise, improves reasoning
πŸ’‘ System prompt guides recursion
🎯 RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)

We're adding it to OpenEnv (with Kashif Rasul): https://github.com/meta-pytorch/OpenEnv/pull/282

More resources:

> Paper: Recursive Language Models (2512.24601)
> Paper blog: https://alexzhang13.github.io/blog/2025/rlm/
> RLM repo: https://github.com/alexzhang13/rlm
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update 2 months ago
sergiopaniegoΒ 
posted an update 2 months ago
view post
Post
2632
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

β€’ SFT
β€’ GRPO
β€’ Tool calling & agents
β€’ RL environments with OpenEnv
β€’ LLMs and VLMs
✨ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
sergiopaniegoΒ 
posted an update 2 months ago
sergiopaniegoΒ 
posted an update 3 months ago