MedGemma 1.5 4B IT (4-bit Quantized)
Overview
This repository contains a 4-bit quantized version of Google's MedGemma 1.5 4B, specifically optimized for efficient deployment on resource-constrained hardware. By leveraging bitsandbytes quantization, this model significantly reduces VRAM requirements while maintaining high-performance medical reasoning capabilities.
This project is part of an initiative by Uniplexity AI to explore localized AI implementations within the Zambian healthcare and economic landscape.
Key Features
- Model Type: Instruction-Tuned (IT) Medical LLM
- Precision: 4-bit NormalFloat (NF4)
- Architecture: Gemma 2
- Optimization: Designed to run on consumer-grade GPUs and high-end CPUs.
- Context: Medical Q&A, clinical reasoning, and health information synthesis.
Technical Specifications
| Feature | Specification |
|---|---|
| Parameters | 4 Billion |
| Quantization Method | BitsAndBytes (4-bit) |
| Compute Dtype | bfloat16 |
| VRAM Requirement | ~4 GB - 8 GB |
| Base Model | google/medgemma-1.5-4b-it |
📊 Benchmarking & Performance
To evaluate the impact of 4-bit quantization on medical reasoning, we performed a comparative benchmark using the PubMedQA (pqa_labeled) dataset. The test consisted of 1,000 closed-ended medical research questions requiring a "yes/no/maybe" response based on provided scientific abstracts.
Accuracy Comparison
| Model Version | Precision | Accuracy (n=1000) | VRAM Usage |
|---|---|---|---|
| MedGemma 1.5 4B (Base) | bfloat16 | 53.20% | ~9.5 GB |
| MedGemma 1.5 4B (Quantized) | INT4 (4-bit) | 45.70% | ~2.8 GB |
Analysis
- Accuracy Retention: The 4-bit model retains approximately 85.9% of the original model's accuracy.
- Efficiency Gain: We achieved a 70% reduction in VRAM requirements, allowing the model to run on consumer-grade hardware and standard CPUs in the Zambian tech ecosystem.
- Observation: While there is a slight drop in exact-match accuracy, the quantized version remains highly capable for general medical screening and information synthesis where high-end GPU infrastructure is unavailable.
Getting Started
Installation
Ensure you have the latest versions of transformers, accelerate, and bitsandbytes installed:
pip install -U transformers accelerate bitsandbytes
Quick Usage Example
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Kondwani0099/medgemma-1.5-4b-4bit-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto"
)
prompt = "Explain the benefits of implementing AI-driven medical screening in rural Zambia."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Use Case: Zambia & Regional Development
This model is particularly suited for regional applications where high-end data center infrastructure may be limited:
- Offline Medical Assistance: Deployable on local workstations for medical professionals in remote areas.
- Startup Integration: Provides a low-cost entry point for Zambian health-tech startups to integrate advanced LLM features.
- Education: Acts as a resource for medical students and researchers in Southern Africa.
Limitations & Disclaimer
Important: This model is for research and informational purposes only. It is not a replacement for professional medical advice, diagnosis, or treatment. Always consult with a qualified healthcare provider for medical concerns. The accuracy of the model can vary based on the complexity of the medical query.
Acknowledgments
- Google DeepMind: For the original MedGemma weights.
- Uniplexity AI: Project oversight and deployment optimization.
- Kondwani Nyirenda: Lead Developer & Quantization.
### Why this works:
* **Metadata Block:** Includes the tags you provided to ensure the model is discoverable in the "Medical" and "Zambia" categories.
* **VRAM Clarity:** Explicitly states that it only needs ~8GB of VRAM, which is a major selling point for quantized models.
* **Professional Tone:** Uses the "Uniplexity AI" branding to establish credibility.
* **Safety First:** Includes a standard medical disclaimer, which is mandatory for any model with "Med" in the name.
- Downloads last month
- 22
Model tree for Kondwani0099/medgemma-1.5-4b-4bit-v1
Base model
google/medgemma-1.5-4b-it