Teja321
/

medgemma-quantized

+---
+library_name: mediapipe
+tags:
+  - medical
+  - llm
+  - gemma
+  - quantized
+  - tflite
+  - int8
+license: apache-2.0
+base_model: google/medgemma-1.5-4b-it
+---
+# MedGemma 1.5 4B - Quantized (INT8)
+This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe.
+## Model Details
+- **Base Model**: MedGemma 1.5 4B (Instruction-Tuned)
+- **Quantization**: INT8 Dynamic Quantization
+- **Model Size**: 3.65 GB (4x reduction from FP32)
+- **Architecture**: Gemma 3
+- **Deployment**: MediaPipe Task Bundle + TFLite
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| `medgemma_1.5_4b.task` | 3.65 GB | MediaPipe task bundle (ready to use) |
+| `gemma3-4b_q8_ekv1024.tflite` | 3.65 GB | TFLite model with INT8 quantization |
+| `tokenizer.model` | 4.5 MB | SentencePiece tokenizer |
+## Quantization Details
+- **Scheme**: Dynamic INT8
+- **Weights**: Quantized to INT8 (171 tensors)
+- **Activations**: FP32 (for accuracy)
+- **KV Cache**: Up to 1024 tokens
+- **Verified**: Weight quantization confirmed
+## Usage
+### MediaPipe Web (Easiest)
+1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference)
+2. Upload `medgemma_1.5_4b.task`
+3. Test with medical prompts
+### Android
+```kotlin
+import com.google.mediapipe.tasks.genai.llminference.LlmInference
+val options = LlmInference.LlmInferenceOptions.builder()
+    .setModelPath("/path/to/medgemma_1.5_4b.task")
+    .setMaxTokens(512)
+    .setTemperature(0.7f)
+    .build()
+val llm = LlmInference.createFromOptions(context, options)
+val response = llm.generateResponse("What are the symptoms of diabetes?")
+```
+### iOS
+```swift
+import MediaPipeTasksGenAI
+let options = LlmInference.Options()
+options.modelPath = "/path/to/medgemma_1.5_4b.task"
+options.maxTokens = 512
+let llm = try LlmInference(options: options)
+let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")
+```
+## Prompt Format
+```
+<start_of_turn>user
+{YOUR_QUESTION}<end_of_turn>
+<start_of_turn>model
+```
+## Example Prompts
+- "What are the common symptoms of type 2 diabetes?"
+- "Explain the difference between systolic and diastolic blood pressure."
+- "What lifestyle changes can help manage hypertension?"
+## Performance
+- **Inference Speed**: ~10-40 tokens/sec on CPU
+- **Memory Usage**: ~5-6 GB RAM
+- **Quantization Impact**: Minimal accuracy degradation vs FP32
+## Limitations
+- **Text-only**: Vision encoder not included in this version
+- **Medical disclaimer**: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.
+## Conversion Process
+Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch):
+1. Downloaded from HuggingFace
+2. Converted to TFLite with INT8 quantization
+3. Bundled with MediaPipe task format
+## Citation
+```bibtex
+@misc{medgemma2024,
+  title={MedGemma: Open medical large language models},
+  author={Google DeepMind},
+  year={2024},
+  url={https://huggingface.co/google/medgemma-1.5-4b-it}
+}
+```
+## License
+Apache 2.0 (same as base model)