MedGemma 4B GGUF - Quantized for African Healthcare

Quantized versions of google/medgemma-1.5-4b-it optimized for on-device medical AI in resource-constrained settings.

Available Models

File Quantization Size RAM Use Case
medgemma-4b-iq2_xs.gguf IQ2_XS + Medical imatrix ~0.9GB ~2GB Budget phones (<$80)
medgemma-4b-q2_k.gguf Q2_K (2-bit) ~1.4GB ~2.5GB Standard budget phones
medgemma-4b-q4_k_m.gguf Q4_K_M (4-bit) ~2.4GB ~4GB Mid-range phones

Medical Importance Matrix

The IQ2_XS model was quantized using a custom importance matrix (imatrix) calibrated on:

  • African primary care scenarios (malaria, typhoid, cholera, respiratory infections)
  • Maternal and child health (pregnancy complications, childhood diarrhea, nutrition)
  • Emergency triage (snake bites, severe dehydration, trauma)
  • Multi-language symptoms (Twi, Hausa, Yoruba, English)

This preserves medical diagnostic accuracy while aggressively compressing general knowledge.

Usage with llama.cpp

./llama-cli -m medgemma-4b-iq2_xs.gguf -p "Patient has fever, chills, and headache for 3 days. What could this be?"

License

Subject to Gemma Terms of Use.

Part of the Nku Project

Built for the Google MedGemma Impact Challenge - bringing AI-powered healthcare to underserved African communities.

Downloads last month
378
GGUF
Model size
4B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for wredd/medgemma-4b-gguf

Quantized
(11)
this model