hf-dell-internal ([INTERNAL] Hugging Face + Dell)

posted an update 3 days ago

Post

156

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

ehcalabres

updated a dataset 4 days ago

hf-dell-internal/image-checksums

Updated 4 days ago • 1.23k

alvarobartt

posted an update 6 days ago

Post

3212

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

alvarobartt

posted an update 3 months ago

Post

3736

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

juanjucm

posted an update 4 months ago

Post

327

Last week,

zai-org dropped zai-org/GLM-4.7-Flash. Now, we bring it to Microsoft Foundry!

- 🏆 30B-A3B MoE, the strongest model in the 30B class. It excels at coding tasks, agentic workflows and reasoning.
- 🤏🏻 Lighter version of his 358B big brother, balancing performance and efficiency.

Not light enough for you? We are also adding

unsloth unsloth/GLM-4.7-Flash-GGUF to the catalog, with GPU and CPU support powered by llama.cpp 🔥

Go join the hype and deploy them from the Hugging Face collection on Microsoft Foundry!

2 replies

·

alvarobartt

posted an update 4 months ago

Post

3258

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

·

ehcalabres

published a dataset 4 months ago

hf-dell-internal/image-checksums

Updated 4 days ago • 1.23k

alvarobartt

published a dataset 6 months ago

hf-dell-internal/container-scans

Updated Dec 3, 2025 • 4

pagezyhf

posted an update 7 months ago

Post

2993

🚀 Big news for AI builders!

We’re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

🔍 Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning — vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

👉 Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.

1 reply

·

pagezyhf

posted an update 8 months ago

Post

873

What’s your biggest headache deploying Hugging Face models to the cloud—and how can we fix it for you?

8 replies

·

pagezyhf

posted an update 8 months ago

Post

515

Qwen3 Next models are available in Azure AI Foundry 🚀

Qwen/qwen3-next-68c25fd6838e585db8eeea9d

pagezyhf

posted an update 8 months ago

Post

3959

🤝 Collaborating with AMD to ensure Hugging Face Transformers runs smoothly on AMD GPUs!

We run daily CI on AMD MI325 to track the health of the most important model architectures and we’ve just made our internal dashboard public.

By making this easily accessible, we hope to spark community contributions and improve support for everyone!

2 replies

·

pagezyhf

posted an update 9 months ago

Post

3234

We've improved the Deploy button on Hugging Face model pages for Microsoft Azure

1/ no more long waits before seeing model support status

2/ ready-to-use CLI and Python snippets

3/ redirection to Azure AI Foundry rather than Azure ML

✋ if you see any bugs or have feedback, open an issue on our repo:
https://github.com/huggingface/Microsoft-Azure

pagezyhf

posted an update 10 months ago

Post

2206

Deploy GPT OSS models with Hugging Face on Azure AI!

We’re thrilled to enable OpenAI GPT OSS models on Azure AI Model Catalog for Azure users to try the model securely the day of its release.

In our official launch blogpost, there’s a section on how to deploy the model to your Azure AI Hub. Get started today!

https://huggingface.co/blog/welcome-openai-gpt-oss#azure

pagezyhf

posted an update 10 months ago

Post

296

We now have the newest Open AI models available on the Dell Enterprise Hub!

We built the Dell Enterprise Hub to provide access to the latest and greatest model from the Hugging Face community to our on-prem customers. We’re happy to give secure access to this amazing contribution from Open AI on the day of its launch!

https://dell.huggingface.co/

pagezyhf

posted an update 10 months ago

Post

387

🟪 Qwen/Qwen3‑235B‑A22B‑Instruct‑2507‑FP8 is now available in Microsoft Azure for one‑click deployment! 🚀

Check out their blogpost: https://qwenlm.github.io/blog/qwen3/

You can now find it in the Hugging Face Collection in Azure ML or Azure AI Foundry, along with 10k other Hugging Face models 🤗🤗
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

Bear with us for the non‑quantized version.

pagezyhf

posted an update 10 months ago

Post

1587

In our recent push to make more models available on Azure, we recently added SmolLM v3 in the catalog! 🚀

@juanjucm wrote a really detailed guide on how to deploy on Azure AI 🤗

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-smollm3

If you want to see other models, please let us know

1 reply

·

pagezyhf

posted an update 10 months ago

Post

227

🎉 New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2

We're excited to welcome Parakeet TDT 0.6B V2—a state-of-the-art English speech-to-text model—to the Azure Foundry Model Catalog.

What is it?

A powerful ASR model built on the FastConformer-TDT architecture, offering:
🕒 Word-level timestamps
✍️ Automatic punctuation & capitalization
🔊 Strong performance across noisy and real-world audio

It runs with NeMo, NVIDIA’s optimized inference engine.

Want to give it a try? 🎧 You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!

📘 Learn more by following this guide written by @alvarobartt

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-nvidia-parakeet-asr

pagezyhf

posted an update 11 months ago

Post

1285

If you want to dive into how the HF team worked with @seungrokj at @AMD
to optimize kernels on MI300, you should give a read to our latest blog!

Such a great educational material for anyone curious about the world of optimizing low level ML.

https://huggingface.co/blog/mi300kernels

pagezyhf

posted an update 11 months ago

Post

1653

In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!

@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.

We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. You’re just an issue away on our GitHub repo!

https://huggingface.co/docs/microsoft-azure/index

[INTERNAL] Hugging Face + Dell

AI & ML interests

Recent Activity

hf-dell-internal/image-checksums

hf-dell-internal/image-checksums

hf-dell-internal/container-scans

AI & ML interests

Recent Activity

Team members 4

hf-dell-internal's activity