Optimum Nvidia

community

https://github.com/huggingface/optimum-nvidia

Activity Feed Request to join this org

AI & ML interests

None defined yet.

posted an update 9 months ago

Post

3344

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

posted an update 9 months ago

Post

445

GH200 cooking time 🧑‍🍳🔥!

We just updated GPU-fryer 🍳 to run on Grace Hopper Superchip (GH200) - fully optimized for ARM-based systems!
With this release, we switched to cuBLASLt to support running FP8 benchmarks. You can monitor GPU throttling, TFLOPS outliers, HBM memory health, and ensure that you get the most of your hardware setup.
Perfect for stress testing and tuning datacenter GPUs.

Check it out on Github 👉 https://github.com/huggingface/gpu-fryer

posted an update 11 months ago

Post

608

AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz

posted an update 11 months ago

Post

1768

Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.

Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.

Hugging Face organizations can sign up here: https://huggingface.co/training-cluster

posted an update 12 months ago

Post

2497

👏 Congrats @jinanz adding TimesFM times series forecasting to Transformers!

Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.co/blog/Nutanix/introducing-timesfm-for-time-series-forcasting

posted an update 12 months ago

Post

519

Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!

Blog post has all the details: https://huggingface.co/blog/dell-ai-applications

posted an update 12 months ago

Post

2645

Transcribing 1 hour of audio for less than $0.01 🤯

@mfuntowicz cooked with 8x faster Whisper speech recognition - whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!

How they did it: https://huggingface.co/blog/fast-whisper-endpoints

1-click deploy with HF Inference Endpoints: https://endpoints.huggingface.co/new?repository=openai%2Fwhisper-large-v3-turbo&vendor=aws&region=us-east&accelerator=gpu&instance_id=aws-us-east-1-nvidia-l4-x1&task=automatic-speech-recognition&no_suggested_compute=true

posted an update about 1 year ago

Post

3045

So many orgs on HF would really benefit from security and governance built into Enterprise Hub - I wrote a guide on why and how upgrade: https://huggingface.co/spaces/jeffboudier/how-to-upgrade-to-enterprise

For instance, did you know about Resource Groups?

authored a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207

posted an update about 1 year ago

Post

2231

Llama4 is out and Scout is already on the Dell Enterprise Hub to deploy on Dell systems 👉 dell.huggingface.co

posted an update about 1 year ago

Post

1614

Enterprise orgs now enable serverless Inference Providers for all members
- includes $2 free usage per org member (e.g. an Enterprise org with 1,000 members share $2,000 free credit each month)
- admins can set a monthly spend limit for the entire org
- works today with Together, fal, Novita, Cerebras and HF Inference.

Here's the doc to bill Inference Providers usage to your org: https://huggingface.co/docs/inference-providers/pricing#organization-billing

2 replies

·

authored a paper over 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

posted an update over 1 year ago

Post

1175

We are introducing multi-backend support in Hugging Face Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned 🤗 !

Check out the details: https://huggingface.co/blog/tgi-multi-backend

posted an update over 1 year ago

Post

768

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos

1 reply

·

posted an update over 1 year ago

Post

1162

New - add your bluesky account to your HF profile:
https://huggingface.co/settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

posted an update over 1 year ago

Post

1133

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co/blog/erikkaum/endpoints-changelog

1 reply

·

posted an update over 1 year ago

Post

490

Inference Endpoints got a bunch of cool updates yesterday, this is my top 3

posted an update over 1 year ago

Post

4182

Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up

3 replies

·

posted an update about 2 years ago

Post

1760

TGI v2.0.2 is out!
- New models (idefics2, phi3)
- Cleaner VLM support in the openai layer
- Upgraded to pytorch 2.3.0

https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

Kudos @Narsil @olivierdehaene @drbh and so many contributors!

posted an update about 2 years ago

Post

1929

These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - 👏 @rita3ko @mchenco @jtkipp @nkothariCF @philschmid

Cloudflare/hf-curated-models-available-on-workers-ai-66036e7ad5064318b3e45db6