@RakshitAralimatti on Hugging Face: "🤔 Ever wondered how OpenAI’s massive GPT‑OSS‑20B runs on just 16 GB of memory…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

RakshitAralimatti

posted an update Aug 8, 2025

Post

250

🤔 Ever wondered how OpenAI’s massive GPT‑OSS‑20B runs on just 16 GB of memory or how GPT‑OSS‑120B runs on a single H100 GPU?

Seems impossible, right?

The secret is Native MXFP4 Quantization it's a 4-bit floating-point format that’s making AI models faster, lighter, and more deployable than ever.

🧠 What’s MXFP4?

MXFP4, or Microscaling FP4, is a specialized 4-bit floating‑point format (E2M1) standardized by the Open Compute Project under the MX (Microscaling) specification. It compresses groups of 32 values using a shared 8-bit scale (E8M0), dramatically lowering memory usage while preserving the dynamic range perfect for compact AI model deployment.

💡 Think of it like this:

Instead of everyone ordering their own expensive meal (full-precision weights), a group shares a family meal (shared scaling). It’s cheaper, lighter, and still gets the job done.

✍️ I’ve broken all of this down in my first Medium blog:

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware
Link - https://medium.com/@rakshitaralimatti2001/4-bit-alchemy-how-mxfp4-makes-massive-models-like-gpt-oss-feasible-for-everyone-573d6630b56c

HF - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-me

deleted

Aug 8, 2025

I find it more appropriate to research how to run a large model on small hardware, rather than how to run a small model on small hardware.

In this post

RakshitAralimatti Rakshit Aralimatti