|
|
--- |
|
|
library_name: mediapipe |
|
|
tags: |
|
|
- medical |
|
|
- llm |
|
|
- gemma |
|
|
- quantized |
|
|
- tflite |
|
|
- int8 |
|
|
license: apache-2.0 |
|
|
base_model: google/medgemma-1.5-4b-it |
|
|
--- |
|
|
|
|
|
# MedGemma 1.5 4B - Quantized (INT8) |
|
|
|
|
|
This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: MedGemma 1.5 4B (Instruction-Tuned) |
|
|
- **Quantization**: INT8 Dynamic Quantization |
|
|
- **Model Size**: 3.65 GB (4x reduction from FP32) |
|
|
- **Architecture**: Gemma 3 |
|
|
- **Deployment**: MediaPipe Task Bundle + TFLite |
|
|
|
|
|
## Files |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `medgemma_1.5_4b.task` | 3.65 GB | MediaPipe task bundle (ready to use) | |
|
|
| `gemma3-4b_q8_ekv1024.tflite` | 3.65 GB | TFLite model with INT8 quantization | |
|
|
| `tokenizer.model` | 4.5 MB | SentencePiece tokenizer | |
|
|
|
|
|
## Quantization Details |
|
|
|
|
|
- **Scheme**: Dynamic INT8 |
|
|
- **Weights**: Quantized to INT8 (171 tensors) |
|
|
- **Activations**: FP32 (for accuracy) |
|
|
- **KV Cache**: Up to 1024 tokens |
|
|
- **Verified**: Weight quantization confirmed |
|
|
|
|
|
## Usage |
|
|
|
|
|
### MediaPipe Web (Easiest) |
|
|
|
|
|
1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference) |
|
|
2. Upload `medgemma_1.5_4b.task` |
|
|
3. Test with medical prompts |
|
|
|
|
|
### Android |
|
|
|
|
|
```kotlin |
|
|
import com.google.mediapipe.tasks.genai.llminference.LlmInference |
|
|
|
|
|
val options = LlmInference.LlmInferenceOptions.builder() |
|
|
.setModelPath("/path/to/medgemma_1.5_4b.task") |
|
|
.setMaxTokens(512) |
|
|
.setTemperature(0.7f) |
|
|
.build() |
|
|
|
|
|
val llm = LlmInference.createFromOptions(context, options) |
|
|
val response = llm.generateResponse("What are the symptoms of diabetes?") |
|
|
``` |
|
|
|
|
|
### iOS |
|
|
|
|
|
```swift |
|
|
import MediaPipeTasksGenAI |
|
|
|
|
|
let options = LlmInference.Options() |
|
|
options.modelPath = "/path/to/medgemma_1.5_4b.task" |
|
|
options.maxTokens = 512 |
|
|
|
|
|
let llm = try LlmInference(options: options) |
|
|
let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?") |
|
|
``` |
|
|
|
|
|
## Prompt Format |
|
|
|
|
|
``` |
|
|
<start_of_turn>user |
|
|
{YOUR_QUESTION}<end_of_turn> |
|
|
<start_of_turn>model |
|
|
|
|
|
``` |
|
|
|
|
|
## Example Prompts |
|
|
|
|
|
- "What are the common symptoms of type 2 diabetes?" |
|
|
- "Explain the difference between systolic and diastolic blood pressure." |
|
|
- "What lifestyle changes can help manage hypertension?" |
|
|
|
|
|
## Performance |
|
|
|
|
|
- **Inference Speed**: ~10-40 tokens/sec on CPU |
|
|
- **Memory Usage**: ~5-6 GB RAM |
|
|
- **Quantization Impact**: Minimal accuracy degradation vs FP32 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Text-only**: Vision encoder not included in this version |
|
|
- **Medical disclaimer**: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice. |
|
|
|
|
|
## Conversion Process |
|
|
|
|
|
Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch): |
|
|
1. Downloaded from HuggingFace |
|
|
2. Converted to TFLite with INT8 quantization |
|
|
3. Bundled with MediaPipe task format |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{medgemma2024, |
|
|
title={MedGemma: Open medical large language models}, |
|
|
author={Google DeepMind}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/google/medgemma-1.5-4b-it} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as base model) |
|
|
|