llrehf

community
Activity Feed

AI & ML interests

None defined yet.

sergiopaniego 
posted an update 5 days ago
view post
Post
274
Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally ( @microsoft ):

🔍 Detects training issues early
🛠 Lets you intervene safely
📊 Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.co/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit
sergiopaniego 
posted an update 6 days ago
sergiopaniego 
posted an update 8 days ago
sergiopaniego 
posted an update 15 days ago
view post
Post
1567
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
  • 1 reply
·
sergiopaniego 
posted an update 18 days ago
sergiopaniego 
posted an update 21 days ago
view post
Post
2976
New REPL environment in OpenEnv available! ✨
Used in the Recursive Language Models (RLM) paper by Alex Zhang.

Ready for inference & post-training using trajectories. Handles long contexts:

> Run Python code in a sandbox
> Make recursive calls to LMs
> Explore data programmatically
> Return final result

Docs: https://meta-pytorch.org/OpenEnv/environments/repl/
Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py
sergiopaniego 
posted an update 22 days ago
view post
Post
456
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!

⚠️ LLMs struggle with long prompts → attention overload & lost info
🔄 RLMs inspect, split & call themselves on chunks, then aggregate results
✅ Handles millions of tokens, reduces noise, improves reasoning
💡 System prompt guides recursion
🎯 RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)

We're adding it to OpenEnv (with Kashif Rasul): https://github.com/meta-pytorch/OpenEnv/pull/282

More resources:

> Paper: Recursive Language Models (2512.24601)
> Paper blog: https://alexzhang13.github.io/blog/2025/rlm/
> RLM repo: https://github.com/alexzhang13/rlm
  • 2 replies
·
sergiopaniego 
posted an update 26 days ago
pcuenq 
posted an update 29 days ago
view post
Post
3134
👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂
  • 1 reply
·
sergiopaniego 
posted an update about 1 month ago
view post
Post
2593
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

• SFT
• GRPO
• Tool calling & agents
• RL environments with OpenEnv
• LLMs and VLMs
✨ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
sergiopaniego 
posted an update about 1 month ago
sergiopaniego 
posted an update about 1 month ago
sergiopaniego 
posted an update about 1 month ago
sergiopaniego 
posted an update about 1 month ago
view post
Post
2011
The Christmas holidays are here! 🎄
Thinking about learning something new in AI?

@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later 🙄)

Let’s explore them!

🧠 𝗟𝗟𝗠 𝗖𝗼𝘂𝗿𝘀𝗲: large language models with HF tools
https://huggingface.co/learn/llm-course

🤖 𝗔𝗴𝗲𝗻𝘁𝘀 𝗖𝗼𝘂𝗿𝘀𝗲: build and deploy AI agents
https://huggingface.co/learn/agents-course

🎨 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗖𝗼𝘂𝗿𝘀𝗲: diffusion models with 🤗 Diffusers
https://huggingface.co/learn/diffusion-course

🔊 𝗔𝘂𝗱𝗶𝗼 𝗖𝗼𝘂𝗿𝘀𝗲: transformers for audio tasks
https://huggingface.co/learn/audio-course

🎮 𝗗𝗲𝗲𝗽 𝗥𝗟 𝗖𝗼𝘂𝗿𝘀𝗲: deep reinforcement learning
https://huggingface.co/learn/deep-rl-course

👁️ 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗖𝗼𝘂𝗿𝘀𝗲: modern computer vision with HF
https://huggingface.co/learn/computer-vision-course

🦾 𝗥𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗖𝗼𝘂𝗿𝘀𝗲 (𝗟𝗲𝗥𝗼𝗯𝗼𝘁): learning-based robotics
https://huggingface.co/learn/robotics-course

🧩 𝗠𝗖𝗣 𝗖𝗼𝘂𝗿𝘀𝗲: Model Context Protocol explained
https://huggingface.co/learn/mcp-course

🧪 𝗔 𝗦𝗺𝗼𝗹 𝗖𝗼𝘂𝗿𝘀𝗲: post-training AI models
https://huggingface.co/learn/a-smol-course

🕹️ 𝗠𝗟 𝗳𝗼𝗿 𝗚𝗮𝗺𝗲𝘀: AI in game development
https://huggingface.co/learn/ml-for-games-course

🧊 𝗠𝗟 𝗳𝗼𝗿 𝟯𝗗: machine learning for 3D data
https://huggingface.co/learn/ml-for-3d-course

📘 𝗢𝗽𝗲𝗻-𝗦𝗼𝘂𝗿𝗰𝗲 𝗔𝗜 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸: practical AI notebooks
https://huggingface.co/learn/cookbook

All of them can be found here: https://huggingface.co/learn
sergiopaniego 
posted an update about 2 months ago
view post
Post
1900
Google DeepMind releases FunctionGemma, a 240M model specialized in 🔧 tool calling, built for fine-tuning

TRL has day-0 support. To celebrate, we’re sharing 2 new resources:

> Colab guide to fine-tune it for 🌐 browser control with BrowserGym OpenEnv
> Standalone training script

> Colab notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_functiongemma_browsergym_openenv.ipynb
> Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym_llm.py (command to run it inside the script)
> More notebooks in TRL: https://huggingface.co/docs/trl/example_overview#notebooks
sergiopaniego 
posted an update about 2 months ago
sergiopaniego 
posted an update about 2 months ago
view post
Post
2150
🎄 last talk of the year about open AI and HF today at Universidad Rey Juan Carlos for undergrad students

always a pleasure to be back at my alma mater

🎅 slides: https://github.com/sergiopaniego/talks
  • 1 reply
·
sergiopaniego 
posted an update about 2 months ago
view post
Post
1731
TRL now includes agent training support for GRPO‼️

Train 🕵️ agents with 🔧 tools, enabling interaction with external functions and APIs.

And of course, a new notebook and scripts to get you up to speed

📘 notebook tutorial: https://github.com/huggingface/trl/blob/main/examples/notebooks/grpo_agent.ipynb

📂 script examples: https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_agent.py

📦 TRL v0.26.0 release: https://github.com/huggingface/trl/releases/tag/v0.26.0
  • 2 replies
·
sergiopaniego 
posted an update about 2 months ago
view post
Post
2887
ICYMI, you can fine-tune open LLMs using Claude Code

just tell it:
“Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”

and Claude submits a real training job on HF GPUs using TRL.

it handles everything:
> dataset validation
> GPU selection
> training + Trackio monitoring
> job submission + cost estimation
when it’s done, your model is on the Hub, ready to use

read more about the process: https://huggingface.co/blog/hf-skills-training
  • 1 reply
·
sergiopaniego 
posted an update about 2 months ago
view post
Post
2296
We just released TRL v0.26.0!

It comes packed with updates:
> Agent training with tools in GRPO
> New CISPO & SAPO losses + reasoning rewards
> vLLM quantization in colocate mode
> Dataset shuffling in SFT
> Lots of NEW examples
> Tons of fixes and documentation improvements

  • 3 replies
·