AI & ML interests

None defined yet.

Recent Activity

maxime-hcompany  updated a collection about 8 hours ago
Holo3.1
maxime-hcompany  updated a collection about 8 hours ago
Holo3.1
maxime-hcompany  updated a collection about 9 hours ago
Holo3.1
View all activity

Articles

sergiopaniego 
posted an update about 16 hours ago
view post
Post
48
If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments
sergiopaniego 
posted an update 1 day ago
sergiopaniego 
posted an update 5 days ago
view post
Post
9869
Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://huggingface.co/blog/agent-glossary
  • 1 reply
·

GGUF?

1
#1 opened 14 days ago by
Notenufftime
sergiopaniego 
posted an update 22 days ago
view post
Post
1868
OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html
sergiopaniego 
posted an update 23 days ago
view post
Post
861
OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces

Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ correct answer, reward 1.0, in 4.2s

Run GRPO (TRL) → model learns to write that search strategy itself

test it yourself → sergiopaniego/repl-env
check out OpenEnv → https://github.com/meta-pytorch/OpenEnv
sergiopaniego 
posted an update about 1 month ago
view post
Post
1412
Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer
plcedoz38 
published an article about 1 month ago
view article
Article

Meet HoloTab by HCompany. Your AI browser companion.

Hcompany
24
sergiopaniego 
posted an update about 2 months ago
hamza-hcompany 
in Hcompany/Holo3-35B-A3B about 2 months ago

Rename README.md to README.mds

#3 opened about 2 months ago by
faizikhan1