Activity Feed

AI & ML interests

Tools for Bluesky πŸ¦‹

erikkaumΒ 
posted an update 11 days ago
view post
Post
3114
Releasing my first kernel πŸ”₯ MaxSim

Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA.

The result is 3–5Γ— speedup compared to naive PyTorch baseline πŸ”₯

Benchmarks:
- SmallRerank (B=32, C=10): up to 3.2Γ— (M3 Pro) / 2.8Γ— (A100)
- HeavyRerank (B=32, C=100): up to 3.8Γ— (M3 Pro) / 5.3Γ— (A100)
- LongDocStress (Ld=1024): up to 6.2Γ— (L4)

Try it out πŸ‘‡
https://huggingface.co/kernels/erikkaum/maxsim
qgallouedecΒ 
posted an update 19 days ago
view post
Post
10141
Shipped hf-sandbox! πŸ₯‘

πŸ§ͺ Running an eval that executes model-generated C on a few thousand prompts? You probably don't want any of that on your laptop.
Just shipped hf-sandbox, a Modal-style sandbox API on top of Hugging Face Jobs. Spin up an isolated, ephemeral container, run untrusted code, get the result back. No Docker on your laptop, no infra to manage.

Just pip install hf-sandbox.

Early days (v0.1); feedback and issues very welcome:
πŸ‘‰ https://github.com/huggingface/hf-sandbox
  • 1 reply
Β·
qgallouedecΒ 
posted an update 21 days ago
view post
Post
348
**TRL v1.4 is out πŸš€** Chunked NLL loss for SFT and a first-class **OpenReward** integration.

**Chunked NLL loss for SFT β€” drops peak VRAM by up to 14Γ—**

Standard SFT materializes a full [batch Γ— seq Γ— vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.

Peak GPU memory, AdamW fp32:
- Qwen3-14B, 8Γ—H100 FSDP2, 16k seq: 58.9 GB β†’ 38.9 GB
- Qwen3-4B, 1Γ—H100 80GB, 16k seq: OOM β†’ 63.8 GB
- Qwen3-32B, 8Γ—H100 FSDP2, 8k seq: OOM β†’ 71.2 GB

End-to-end it's consistently as fast or faster than nll, and unlocks sequence lengths that don't fit at all under the standard path.

SFTConfig(loss_type="chunked_nll")


Works with PEFT and VLMs out of the box.

**Open Reward Standard environment adapter**

The new trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string β€” a catalog name or a URL β€” wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:

from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)

trainer = GRPOTrainer(
    ...,
    train_dataset=spec.train_dataset,
    environment_factory=spec.environment_factory,
    reward_funcs=spec.reward_funcs,
)


v1.4 also brings MFU helpers for dense + MoE models, GRPO support for Liger 0.8.0 (delta clipping + VESPO + KL bias correction), TΓΌlu 3's length-normalized DPO loss, four more training chat templates (Cohere, Cohere2, Gemma 3, Qwen3-2507), and a 5+ GB CUDA memory leak fix in activation offloading.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.4.0
qgallouedecΒ 
posted an update about 1 month ago
view post
Post
8056

TRL v1.3 ships day-one training support for Qwen 3.6 πŸš€

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling β€” just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
qgallouedecΒ 
posted an update about 1 month ago
view post
Post
2021
TRL v1.2 introduces the SSDTrainer πŸš€

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()


v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0
qgallouedecΒ 
posted an update about 2 months ago
view post
Post
2452
TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl


Read more: hf.co/blog/trl-v1
qgallouedecΒ 
posted an update 3 months ago
view post
Post
3076
@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

πŸ‘‰ https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
  • 1 reply
Β·
davanstrienΒ 
posted an update 9 months ago
BrigitteTousiΒ 
posted an update 10 months ago
clemΒ 
posted an update 10 months ago
BrigitteTousiΒ 
posted an update 10 months ago
view post
Post
691
New interactive viz from AI World showing OpenAI's new open model gpt-oss-120b breaking into the top 50 most liked models of all time on the Hub in under a day! β˜„οΈβ˜„οΈβ˜„οΈ
BrigitteTousiΒ 
posted an update 10 months ago
view post
Post
709
This is what Hugging Face is all about. We want everyone, hobbyists, researchers and industry alike, to be able to contribute to AI because everyone is affected by it. Kudos to HF's @irenesolaiman for spreading the word!πŸ”₯πŸ€—
erikkaumΒ 
posted an update 11 months ago
view post
Post
2704
ZML just released a technical preview of their new Inference Engine: LLMD.

- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig

I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide πŸ‘‡ You can try it in like 5 minutes!

https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine
  • 1 reply
Β·
erikkaumΒ 
posted an update 11 months ago
view post
Post
2197
We just released native support for @SGLang and @vllm-project in Inference Endpoints πŸ”₯

Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.

And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users πŸ™Œ
cfahlgren1Β 
posted an update 11 months ago
view post
Post
1217
I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-results

You can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!!

clemΒ 
posted an update 11 months ago
davanstrienΒ 
posted an update 12 months ago
view post
Post
3745
Inspired by Hugging Face's official MCP server, I've developed a complementary tool that exposes my semantic search API to enhance discovery across the HF platform.

Key capabilities:

- AI-powered semantic search for models and datasets
- Parameter count analysis via safetensors metadata
- Trending content discovery
- Find similar models/datasets functionality
- 11 tools total for enhanced ecosystem navigation

The semantic search goes beyond simple keyword matching, understanding context and relationships between different models and datasets.

Example query: "Find around 10 reasoning Hugging Face datasets published in 2025 focusing on topics other than maths and science. Show a link and a short summary for each dataset." (results in video!)

https://github.com/davanstrien/hub-semantic-search-mcp
  • 1 reply
Β·
cfahlgren1Β 
posted an update 12 months ago