medgemma-quantized / README.md
Teja321's picture
Upload README.md with huggingface_hub
55131ff verified
---
library_name: mediapipe
tags:
- medical
- llm
- gemma
- quantized
- tflite
- int8
license: apache-2.0
base_model: google/medgemma-1.5-4b-it
---
# MedGemma 1.5 4B - Quantized (INT8)
This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe.
## Model Details
- **Base Model**: MedGemma 1.5 4B (Instruction-Tuned)
- **Quantization**: INT8 Dynamic Quantization
- **Model Size**: 3.65 GB (4x reduction from FP32)
- **Architecture**: Gemma 3
- **Deployment**: MediaPipe Task Bundle + TFLite
## Files
| File | Size | Description |
|------|------|-------------|
| `medgemma_1.5_4b.task` | 3.65 GB | MediaPipe task bundle (ready to use) |
| `gemma3-4b_q8_ekv1024.tflite` | 3.65 GB | TFLite model with INT8 quantization |
| `tokenizer.model` | 4.5 MB | SentencePiece tokenizer |
## Quantization Details
- **Scheme**: Dynamic INT8
- **Weights**: Quantized to INT8 (171 tensors)
- **Activations**: FP32 (for accuracy)
- **KV Cache**: Up to 1024 tokens
- **Verified**: Weight quantization confirmed
## Usage
### MediaPipe Web (Easiest)
1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference)
2. Upload `medgemma_1.5_4b.task`
3. Test with medical prompts
### Android
```kotlin
import com.google.mediapipe.tasks.genai.llminference.LlmInference
val options = LlmInference.LlmInferenceOptions.builder()
.setModelPath("/path/to/medgemma_1.5_4b.task")
.setMaxTokens(512)
.setTemperature(0.7f)
.build()
val llm = LlmInference.createFromOptions(context, options)
val response = llm.generateResponse("What are the symptoms of diabetes?")
```
### iOS
```swift
import MediaPipeTasksGenAI
let options = LlmInference.Options()
options.modelPath = "/path/to/medgemma_1.5_4b.task"
options.maxTokens = 512
let llm = try LlmInference(options: options)
let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")
```
## Prompt Format
```
<start_of_turn>user
{YOUR_QUESTION}<end_of_turn>
<start_of_turn>model
```
## Example Prompts
- "What are the common symptoms of type 2 diabetes?"
- "Explain the difference between systolic and diastolic blood pressure."
- "What lifestyle changes can help manage hypertension?"
## Performance
- **Inference Speed**: ~10-40 tokens/sec on CPU
- **Memory Usage**: ~5-6 GB RAM
- **Quantization Impact**: Minimal accuracy degradation vs FP32
## Limitations
- **Text-only**: Vision encoder not included in this version
- **Medical disclaimer**: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.
## Conversion Process
Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch):
1. Downloaded from HuggingFace
2. Converted to TFLite with INT8 quantization
3. Bundled with MediaPipe task format
## Citation
```bibtex
@misc{medgemma2024,
title={MedGemma: Open medical large language models},
author={Google DeepMind},
year={2024},
url={https://huggingface.co/google/medgemma-1.5-4b-it}
}
```
## License
Apache 2.0 (same as base model)