🏗️ Building on HF

Sergio Paniego PRO

sergiopaniego

230 192 124

https://sergiopaniego.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 13 hours ago

agents-course/final-certificates

updated a dataset about 13 hours ago

agents-course/course-certificates-of-excellence

updated a dataset 3 days ago

huggingface-projects/Deep-RL-Course-Certification

View all activity

Organizations

updated 2 datasets about 13 hours ago

agents-course/final-certificates

Viewer • Updated about 1 hour ago • 8 • 1.62k • 15

agents-course/course-certificates-of-excellence

Viewer • Updated about 1 hour ago • 4.9k • 1.16k • 11

updated a dataset 3 days ago

huggingface-projects/Deep-RL-Course-Certification

Viewer • Updated 3 days ago • 1.75k • 279 • 18

updated a Space 3 days ago

README

🚀

posted an update 3 days ago

Post

154

TRL v1.7.0 is out‼️

+ continuous batching makes GRPO and RLOO 1.25x faster at -16 GB
+ proper MoE post-training across GRPO/RLOO/AsyncGRPO
+ new GMPO trainer
+ AsyncGRPO weight sync + padding-free
+ more

https://github.com/huggingface/trl/releases/tag/v1.7.0

wrote a small article about the continuous batching for GRPO feature

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo

upvoted an article 3 days ago

Article

Run a vLLM Server on HF Jobs in One Command

qgallouedec

•

4 days ago

• 9

upvoted a collection 4 days ago

OpenThinker-Agent2

Collection

OpenThinker-Agent2: agentic SFT/RL datasets and 8B/32B models (cold-start SFT, RL, and the OpenThinkerAgent-32B release). • 11 items • Updated 18 days ago • 8

upvoted an article 5 days ago

Article

Building Moon Bot: A Slack-Native Coding Agent Backed by HuggingFace Buckets

huggingface

•

5 days ago

• 42

upvoted a paper 5 days ago

ECHO: Terminal Agents Learn World Models for Free

Paper • 2605.24517 • Published May 23 • 8

posted an update 10 days ago

Post

257

Continuous batching just landed in TRL for GRPO!

At 64 generations it runs faster and uses less VRAM than plain generate, no vLLM needed

How it works and when to reach for it, below

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo

published an article 10 days ago

Article

Continuous batching for GRPO, now in TRL

sergiopaniego

•

10 days ago

• 8

liked a dataset 11 days ago

badlogicgames/pi-mono

Preview • Updated Apr 6 • 2.4k • 170

posted an update 12 days ago

Post

251

GLM-5.2 is open and comes with competitive performance against opus 4.8

day-0 in transformers + vllm + sglang, mit license 🤗

on the post-training side: critic-based ppo for variable-length agentic rollouts (ppo is back!) + an online anti-reward-hacking module that feeds the agent dummy info when it tries to cheat

upvoted an article 12 days ago

Article

GLM-5.2: Built for Long-Horizon Tasks

zai-org

•

12 days ago

• 110

upvoted an article 14 days ago

Article

I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

sergiopaniego

•

14 days ago

• 4

published an article 14 days ago

Article

I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

sergiopaniego

•

14 days ago

• 4

upvoted an article 14 days ago

Article

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

sergiopaniego, ariG23498

•

May 25

• 124

updated a dataset 14 days ago

huggingface/documentation-images

Viewer • Updated 4 days ago • 59 • 3.01M • 159

updated a model 14 days ago

sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora

Updated 14 days ago

published a bucket 14 days ago

sergiopaniego/trl-text-to-sql-static-f7b321-bucket

0 Bytes

Sergio Paniego PRO

AI & ML interests

Recent Activity

Organizations

sergiopaniego's activity

README

Run a vLLM Server on HF Jobs in One Command

Building Moon Bot: A Slack-Native Coding Agent Backed by HuggingFace Buckets

Continuous batching for GRPO, now in TRL

GLM-5.2: Built for Long-Horizon Tasks

I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

sergiopaniego/trl-text-to-sql-static-f7b321-bucket