MedGemma 4B GGUF - Quantized for African Healthcare

Quantized versions of google/medgemma-1.5-4b-it optimized for on-device medical AI in resource-constrained settings.

Available Models

File	Quantization	Size	RAM	Use Case
`medgemma-4b-iq2_xs.gguf`	IQ2_XS + Medical imatrix	~0.9GB	~2GB	Budget phones (<$80)
`medgemma-4b-q2_k.gguf`	Q2_K (2-bit)	~1.4GB	~2.5GB	Standard budget phones
`medgemma-4b-q4_k_m.gguf`	Q4_K_M (4-bit)	~2.4GB	~4GB	Mid-range phones

The IQ2_XS model was quantized using a custom importance matrix (imatrix) calibrated on:

African primary care scenarios (malaria, typhoid, cholera, respiratory infections)
Maternal and child health (pregnancy complications, childhood diarrhea, nutrition)
Emergency triage (snake bites, severe dehydration, trauma)
Multi-language symptoms (Twi, Hausa, Yoruba, English)

This preserves medical diagnostic accuracy while aggressively compressing general knowledge.

./llama-cli -m medgemma-4b-iq2_xs.gguf -p "Patient has fever, chills, and headache for 3 days. What could this be?"

Built for the Google MedGemma Impact Challenge - bringing AI-powered healthcare to underserved African communities.

GGUF

Model size

4B params

Architecture

gemma3

Hardware compatibility

1-bit

2-bit

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(11)

this model