--- library_name: mediapipe tags: - medical - llm - gemma - quantized - tflite - int8 license: apache-2.0 base_model: google/medgemma-1.5-4b-it --- # MedGemma 1.5 4B - Quantized (INT8) This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe. ## Model Details - **Base Model**: MedGemma 1.5 4B (Instruction-Tuned) - **Quantization**: INT8 Dynamic Quantization - **Model Size**: 3.65 GB (4x reduction from FP32) - **Architecture**: Gemma 3 - **Deployment**: MediaPipe Task Bundle + TFLite ## Files | File | Size | Description | |------|------|-------------| | `medgemma_1.5_4b.task` | 3.65 GB | MediaPipe task bundle (ready to use) | | `gemma3-4b_q8_ekv1024.tflite` | 3.65 GB | TFLite model with INT8 quantization | | `tokenizer.model` | 4.5 MB | SentencePiece tokenizer | ## Quantization Details - **Scheme**: Dynamic INT8 - **Weights**: Quantized to INT8 (171 tensors) - **Activations**: FP32 (for accuracy) - **KV Cache**: Up to 1024 tokens - **Verified**: Weight quantization confirmed ## Usage ### MediaPipe Web (Easiest) 1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference) 2. Upload `medgemma_1.5_4b.task` 3. Test with medical prompts ### Android ```kotlin import com.google.mediapipe.tasks.genai.llminference.LlmInference val options = LlmInference.LlmInferenceOptions.builder() .setModelPath("/path/to/medgemma_1.5_4b.task") .setMaxTokens(512) .setTemperature(0.7f) .build() val llm = LlmInference.createFromOptions(context, options) val response = llm.generateResponse("What are the symptoms of diabetes?") ``` ### iOS ```swift import MediaPipeTasksGenAI let options = LlmInference.Options() options.modelPath = "/path/to/medgemma_1.5_4b.task" options.maxTokens = 512 let llm = try LlmInference(options: options) let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?") ``` ## Prompt Format ``` user {YOUR_QUESTION} model ``` ## Example Prompts - "What are the common symptoms of type 2 diabetes?" - "Explain the difference between systolic and diastolic blood pressure." - "What lifestyle changes can help manage hypertension?" ## Performance - **Inference Speed**: ~10-40 tokens/sec on CPU - **Memory Usage**: ~5-6 GB RAM - **Quantization Impact**: Minimal accuracy degradation vs FP32 ## Limitations - **Text-only**: Vision encoder not included in this version - **Medical disclaimer**: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice. ## Conversion Process Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch): 1. Downloaded from HuggingFace 2. Converted to TFLite with INT8 quantization 3. Bundled with MediaPipe task format ## Citation ```bibtex @misc{medgemma2024, title={MedGemma: Open medical large language models}, author={Google DeepMind}, year={2024}, url={https://huggingface.co/google/medgemma-1.5-4b-it} } ``` ## License Apache 2.0 (same as base model)