Agents Hack

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

HennersBro98 authored a paper 7 days ago

Atla Selene Mini: A General Purpose Evaluation Model

HennersBro98 authored a paper 7 days ago

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

spisupat authored a paper 7 days ago

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

View all activity

HennersBro98

authored 2 papers 7 days ago

Atla Selene Mini: A General Purpose Evaluation Model

Paper • 2501.17195 • Published Jan 27, 2025 • 35

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

Paper • 2606.21627 • Published 12 days ago • 6

spisupat

authored a paper 7 days ago

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

Paper • 2606.21627 • Published 12 days ago • 6

spisupat

submitted a paper to Daily Papers 8 days ago

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

Paper • 2606.21627 • Published 12 days ago • 6

stevenbucaille

posted an update 5 months ago

Post

358

LWDetr is available in 🤗 transformers !
Checkout the collection to find the original paper, model weights and a demo space : https://huggingface.co/collections/stevenbucaille/lwdetr

Jofthomas

posted an update 7 months ago

Post

4547

The new Mistral 3 models are here !

Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters.

All models are released under the Apache 2.0 license.

Ministrals :
https://huggingface.co/collections/mistralai/ministral-3

Mistral Large 3:
https://huggingface.co/collections/mistralai/mistral-large-3

2 replies

nouamanetazi

posted an update 8 months ago

Post

4962

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team

thomwolf

authored a paper 9 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 137

jdelavande

authored a paper 9 months ago

Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models

Paper • 2509.19222 • Published Sep 23, 2025

thomwolf

authored a paper about 1 year ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 79

m-ric

updated a Space about 1 year ago

README

🔥

Cidoyi

updated a Space about 1 year ago

README

🔥

thomwolf

authored a paper about 1 year ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 161

m-ric

published a Space about 1 year ago

README

🔥

Jofthomas

posted an update about 1 year ago

Post

4757

Meet our new agentic model : 𝗗𝗲𝘃𝘀𝘁𝗿𝗮𝗹

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌.

𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 :
• 🤖 𝗔𝗴𝗲𝗻𝘁𝘀 : perfect for Agentic coding
• 🍃 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁: Devstral is a 𝟮𝟰𝗕 parameter based on Mistral small.
• ©️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬, meaning fully open-source !
• 📄 A 𝟭𝟮𝟴𝗸 context window.

📚Blog : https://mistral.ai/news/devstral
⚡API : The model is also available on our API under the name 𝗱𝗲𝘃𝘀𝘁𝗿𝗮𝗹-𝘀𝗺𝗮𝗹𝗹-𝟮𝟱𝟬𝟱
🤗 repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !

1 reply

thomwolf

posted an update about 1 year ago

Post

8779

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!

1 reply

thomwolf

authored a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 210

nouamanetazi

authored a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 210

thomwolf

authored a paper about 1 year ago

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 23

thomwolf

posted an update over 1 year ago

Post

3953

The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324

1 reply

AI & ML interests

Recent Activity

Team members 81

Agents-Hack's activity

README

README

README