π
Julien Chaumond PRO
AI & ML interests
<3 ML/AI for everyone, building products to propel communities fwd
Recent Activity
liked
a model about 5 hours ago
google/timesfm-2.5-200m-transformers liked
a model about 5 hours ago
google/timesfm-2.5-200m-pytorch liked
a model about 5 hours ago
onnx-community/granite-timeseries-patchtst Organizations
replied to nroggendorff's post 3 months ago
reacted to nroggendorff's post with ππ 3 months ago
reacted to erikkaum's post with π€ 7 months ago
Post
2157
We just released native support for @SGLang and @vllm-project in Inference Endpoints π₯
Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.
And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users π
Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.
And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users π
let's gooo!!
reacted to jsulz's post with π 8 months ago
Post
5929
It's been a bit since I took a step back and looked at
xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
π€ 700,000 users/orgs
π 350,000 repos
π 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from
commoncrawl was 471 TB.
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski π)
Let me know if there's anything you're interested in; happy to dig in!
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
π€ 700,000 users/orgs
π 350,000 repos
π 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski π)
Let me know if there's anything you're interested in; happy to dig in!
reacted to Narsil's post with ππ₯ 9 months ago
Post
2437
Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.
Me: *Slacking off with colleagues*
Cursor: Ping.
Me: π€―
Cursor: Hold my beer.
Me: *Slacking off with colleagues*
Cursor: Ping.
Me: π€―
reacted to jsulz's post with π₯ 9 months ago
Post
2661
Heyo @RichardErkhov the
xet-team at Hugging face was wondering if you wanted to join the fun and jump over to Xet storage. π€
We've been onboarding folks https://huggingface.co/blog/xet-on-the-hub know the backend can scale (Llama 4 and Qwen 3 are on Xet), is great for working with quants (see xet-team/quantization-dedup ), and we're pushing on inviting impactful orgs and users on the Hub. You fit the bill.
We'd love to onboard you, get some feedback, and create some excitement π
The steps are pretty straightforward - join the waitlist at hf.co/join/xet and we'll take care of the rest.
The system is fully backward compatible, so you shouldn't notice a thing. BUT to get the best experience when uploading/downloading, make sure you have
What do you think?
We've been onboarding folks https://huggingface.co/blog/xet-on-the-hub know the backend can scale (Llama 4 and Qwen 3 are on Xet), is great for working with quants (see xet-team/quantization-dedup ), and we're pushing on inviting impactful orgs and users on the Hub. You fit the bill.
We'd love to onboard you, get some feedback, and create some excitement π
The steps are pretty straightforward - join the waitlist at hf.co/join/xet and we'll take care of the rest.
The system is fully backward compatible, so you shouldn't notice a thing. BUT to get the best experience when uploading/downloading, make sure you have
hf_xet installed alongside the latest huggingface_hub What do you think?
reacted to reach-vb's post with π 9 months ago
Post
4728
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! π₯
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage
p.p.s. this is fully backwards compatible so everything will work as it should! π€
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage
p.p.s. this is fully backwards compatible so everything will work as it should! π€
WOOHOO!!
reacted to cbensimon's post with π₯ 10 months ago
Post
6131
π ZeroGPU
Nothing too fancy for nowβZeroGPU Spaces still default to
- π° size-based quotas / pricing (
- 𦣠the upcoming
You can as of now control GPU size via a Space variable. Accepted values:
-
-
-
The auto mode checks total CUDA tensor size during startup:
- More than 30GB β
- Otherwise β
medium size is now available as a power-user featureNothing too fancy for nowβZeroGPU Spaces still default to
large (70GB VRAM)βbut this paves the way for:- π° size-based quotas / pricing (
medium will offer significantly more usage than large)- 𦣠the upcoming
xlarge size (141GB VRAM)You can as of now control GPU size via a Space variable. Accepted values:
-
auto (future default)-
medium-
large (current default)The auto mode checks total CUDA tensor size during startup:
- More than 30GB β
large- Otherwise β
medium replied to their post 10 months ago
did you get it to work since?
Post
4402
Important notice π¨
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference β with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference β with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
reacted to danielhanchen's post with β€οΈπ€π₯ 10 months ago
Post
6297
π¦₯ Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kβ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kβ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.