MedGemma 1.5 4B - Quantized (INT8)
This is a quantized version of google/medgemma-1.5-4b-it optimized for on-device deployment using TensorFlow Lite and MediaPipe.
Model Details
- Base Model: MedGemma 1.5 4B (Instruction-Tuned)
- Quantization: INT8 Dynamic Quantization
- Model Size: 3.65 GB (4x reduction from FP32)
- Architecture: Gemma 3
- Deployment: MediaPipe Task Bundle + TFLite
Files
| File | Size | Description |
|---|---|---|
medgemma_1.5_4b.task |
3.65 GB | MediaPipe task bundle (ready to use) |
gemma3-4b_q8_ekv1024.tflite |
3.65 GB | TFLite model with INT8 quantization |
tokenizer.model |
4.5 MB | SentencePiece tokenizer |
Quantization Details
- Scheme: Dynamic INT8
- Weights: Quantized to INT8 (171 tensors)
- Activations: FP32 (for accuracy)
- KV Cache: Up to 1024 tokens
- Verified: Weight quantization confirmed
Usage
MediaPipe Web (Easiest)
- Go to MediaPipe Studio
- Upload
medgemma_1.5_4b.task - Test with medical prompts
Android
import com.google.mediapipe.tasks.genai.llminference.LlmInference
val options = LlmInference.LlmInferenceOptions.builder()
.setModelPath("/path/to/medgemma_1.5_4b.task")
.setMaxTokens(512)
.setTemperature(0.7f)
.build()
val llm = LlmInference.createFromOptions(context, options)
val response = llm.generateResponse("What are the symptoms of diabetes?")
iOS
import MediaPipeTasksGenAI
let options = LlmInference.Options()
options.modelPath = "/path/to/medgemma_1.5_4b.task"
options.maxTokens = 512
let llm = try LlmInference(options: options)
let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")
Prompt Format
<start_of_turn>user
{YOUR_QUESTION}<end_of_turn>
<start_of_turn>model
Example Prompts
- "What are the common symptoms of type 2 diabetes?"
- "Explain the difference between systolic and diastolic blood pressure."
- "What lifestyle changes can help manage hypertension?"
Performance
- Inference Speed: ~10-40 tokens/sec on CPU
- Memory Usage: ~5-6 GB RAM
- Quantization Impact: Minimal accuracy degradation vs FP32
Limitations
- Text-only: Vision encoder not included in this version
- Medical disclaimer: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.
Conversion Process
Converted using ai-edge-torch:
- Downloaded from HuggingFace
- Converted to TFLite with INT8 quantization
- Bundled with MediaPipe task format
Citation
@misc{medgemma2024,
title={MedGemma: Open medical large language models},
author={Google DeepMind},
year={2024},
url={https://huggingface.co/google/medgemma-1.5-4b-it}
}
License
Apache 2.0 (same as base model)
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Teja321/medgemma-quantized
Base model
google/medgemma-1.5-4b-it