H company

Team

company

Verified

https://www.hcompany.ai/

hcompany_ai

hcompai

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

maxime-hcompany updated a collection 1 day ago

Holo3.1

maxime-hcompany updated a collection 1 day ago

Holo3.1

maxime-hcompany updated a collection 1 day ago

Holo3.1

View all activity

Papers

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

View all Papers

Articles

updated a collection 1 day ago

Holo3.1

Collection

Computer use - edge to cloud • 5 items • Updated 1 day ago • 1

sergiopaniego

posted an update 1 day ago

Post

If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments

sergiopaniego

posted an update 2 days ago

Post

167

periodic reminder 🧐

some HF blog posts use a special template, long-form, animated, super deep

I keep them all in one collection that gets updated every time a new one drops so you don't lose track

https://hf.co/collections/sergiopaniego/research-and-long-form-blog-posts

spot one missing? let me know

sergiopaniego

posted an update 5 days ago

Post

9885

Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://huggingface.co/blog/agent-glossary

1 reply

avshalom-h

in Hcompany/Holotron-3-Nano 12 days ago

GGUF?

#1 opened 15 days ago by

Notenufftime

sergiopaniego

posted an update 22 days ago

Post

1871

OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html

sergiopaniego

posted an update 23 days ago

Post

863

OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces

Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ correct answer, reward 1.0, in 4.2s

Run GRPO (TRL) → model learns to write that search strategy itself

test it yourself → sergiopaniego/repl-env
check out OpenEnv → https://github.com/meta-pytorch/OpenEnv

hamza-hcompany

in Hcompany/Holo3-35B-A3B 25 days ago

Is there any special design on agentical framwork? memory/planning? I only got 23% score on OSWorld

#5 opened about 1 month ago by

Wenjin0421

hamza-hcompany

updated a collection about 1 month ago

Holotron

Collection

2 items • Updated Apr 28 • 1

hamza-hcompany

published a model about 1 month ago

Hcompany/Holotron-3-Nano

Image-Text-to-Text • 33B • Updated Apr 28 • 546 • 23

hamza-hcompany

updated a model about 1 month ago

Hcompany/Holotron-3-Nano

Image-Text-to-Text • 33B • Updated Apr 28 • 546 • 23

h-aurelien-lac

updated a model about 1 month ago

Hcompany/Holotron-3-Nano

Image-Text-to-Text • 33B • Updated Apr 28 • 546 • 23

sergiopaniego

posted an update about 2 months ago

Post

1416

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer

plcedoz38

published an article about 2 months ago

Article

Meet HoloTab by HCompany. Your AI browser companion.

Hcompany

•

Apr 15

• 24

sergiopaniego

posted an update about 2 months ago

Post

498

Great experience yesterday at PyTorch Conf Europe in Paris 🇫🇷

We (w/ @kashif ) talked about training LLMs through interaction, using trajectories across games, browsers, or simulators

Room was packed, a clear sign of interest in where RL post-training is heading.

sharing the slides! 🤓
https://drive.google.com/file/d/16k7YRnf5EJEo0XjXGlRJ_hVeLoFWKyNP/view?usp=sharing