AI & ML interests

None defined yet.

Recent Activity

stevenbucailleย 
posted an update 5 months ago
Jofthomasย 
posted an update 7 months ago
view post
Post
4547
The new Mistral 3 models are here !

Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 โ€“ our most capable model to date โ€“ a sparse mixture-of-experts trained with 41B active and 675B total parameters.

All models are released under the Apache 2.0 license.

Ministrals :
https://huggingface.co/collections/mistralai/ministral-3

Mistral Large 3:
https://huggingface.co/collections/mistralai/mistral-large-3
  • 2 replies
ยท
nouamanetaziย 
posted an update 8 months ago
view post
Post
4962
After training ๐’๐ฆ๐จ๐ฅ๐‹๐Œ๐Ÿ‘ on ๐Ÿ‘๐Ÿ–๐Ÿ’ ๐‡๐Ÿ๐ŸŽ๐ŸŽ๐ฌ for nearly a month, I've come to realize something most people overlook: ๐ข๐ง๐Ÿ๐ซ๐š๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž ๐ข๐ฌ ๐ญ๐ก๐ž ๐ฆ๐š๐ค๐ž-๐จ๐ซ-๐›๐ซ๐ž๐š๐ค ๐Ÿ๐š๐œ๐ญ๐จ๐ซ ๐ข๐ง ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐ . ๐Ÿ”ฅ

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious ๐๐‚๐‚๐‹ ๐ž๐ซ๐ซ๐จ๐ซ๐ฌ, or when your expensive GPU cluster is running at ๐Ÿ”๐ŸŽ% ๐ž๐Ÿ๐Ÿ๐ข๐œ๐ข๐ž๐ง๐œ๐ฒ, the problem isn't your model. It's most probably a ๐ฆ๐ข๐ฌ๐ฎ๐ฌ๐ž ๐จ๐Ÿ ๐ญ๐ก๐ž ๐ก๐š๐ซ๐๐ฐ๐š๐ซ๐ž. ๐Ÿ› ๏ธ

Questions that seemed simple but had no clear answers: Why is ๐Œ๐จ๐„ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ฌ๐ฅ๐จ๐ฐ๐ž๐ซ ๐ญ๐ก๐š๐ง ๐๐ž๐ง๐ฌ๐ž ๐ฆ๐จ๐๐ž๐ฅ๐ฌ? Which ๐๐‚๐‚๐‹ ๐Ÿ๐ฅ๐š๐ ๐ฌ should we actually set? How often should we checkpoint without killing throughput?

That's why we built ๐“๐ก๐ž ๐’๐ฆ๐จ๐ฅ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐๐ฅ๐š๐ฒ๐›๐จ๐จ๐ค ๐Ÿ“–: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the ๐ข๐ง๐Ÿ๐ซ๐š๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž ๐ฅ๐š๐ฒ๐ž๐ซ that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: ๐‡๐๐Œ๐Ÿ‘ ๐ก๐ข๐ญ๐ญ๐ข๐ง๐  ๐Ÿ‘ ๐“๐/๐ฌ, ๐๐•๐‹๐ข๐ง๐ค ๐Ÿ’.๐ŸŽ ๐ซ๐ž๐š๐œ๐ก๐ข๐ง๐  ๐Ÿ•๐Ÿ–๐Ÿ” ๐†๐/๐ฌ, ๐๐‚๐ˆ๐ž ๐†๐ž๐ง๐Ÿ’ ๐š๐ญ ๐Ÿ๐Ÿ’.๐Ÿ ๐†๐/๐ฌ. Then we ran collective operations across ๐Ÿ๐Ÿ๐Ÿ– ๐†๐๐”๐ฌ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from ๐Ÿ’๐Ÿ–๐ŸŽ ๐†๐/๐ฌ on a single node to ๐Ÿ‘๐Ÿ๐ŸŽ-๐Ÿ‘๐Ÿ“๐ŸŽ ๐†๐/๐ฌ across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

๐“๐ก๐ž ๐’๐ฆ๐จ๐ฅ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐๐ฅ๐š๐ฒ๐›๐จ๐จ๐ค: https://lnkd.in/e5MKXUHS

Shared with โค๏ธ by the HuggingFace team
m-ricย 
updated a Space about 1 year ago
Cidoyiย 
updated a Space about 1 year ago
m-ricย 
published a Space about 1 year ago
Jofthomasย 
posted an update about 1 year ago
view post
Post
4757
Meet our new agentic model : ๐——๐—ฒ๐˜ƒ๐˜€๐˜๐—ฟ๐—ฎ๐—น

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI ๐Ÿ™Œ.

๐—ž๐—ฒ๐˜† ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€ :
โ€ข ๐Ÿค– ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ : perfect for Agentic coding
โ€ข ๐Ÿƒ ๐—น๐—ถ๐—ด๐—ต๐˜๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜: Devstral is a ๐Ÿฎ๐Ÿฐ๐—• parameter based on Mistral small.
โ€ข ยฉ๏ธ ๐—”๐—ฝ๐—ฎ๐—ฐ๐—ต๐—ฒ ๐Ÿฎ.๐Ÿฌ, meaning fully open-source !
โ€ข ๐Ÿ“„ A ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ context window.

๐Ÿ“šBlog : https://mistral.ai/news/devstral
โšกAPI : The model is also available on our API under the name ๐—ฑ๐—ฒ๐˜ƒ๐˜€๐˜๐—ฟ๐—ฎ๐—น-๐˜€๐—บ๐—ฎ๐—น๐—น-๐Ÿฎ๐Ÿฑ๐Ÿฌ๐Ÿฑ
๐Ÿค— repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !
  • 1 reply
ยท
thomwolfย 
posted an update about 1 year ago
view post
Post
8779
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Faceโ€”in robotics and across all AI fieldsโ€”we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at
pollen-robotics


We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
ยท
thomwolfย 
posted an update over 1 year ago
view post
Post
3953
The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
  • 1 reply
ยท