MedGemma 1.5 4B - Quantized (INT8)

This is a quantized version of google/medgemma-1.5-4b-it optimized for on-device deployment using TensorFlow Lite and MediaPipe.

Model Details

  • Base Model: MedGemma 1.5 4B (Instruction-Tuned)
  • Quantization: INT8 Dynamic Quantization
  • Model Size: 3.65 GB (4x reduction from FP32)
  • Architecture: Gemma 3
  • Deployment: MediaPipe Task Bundle + TFLite

Files

File Size Description
medgemma_1.5_4b.task 3.65 GB MediaPipe task bundle (ready to use)
gemma3-4b_q8_ekv1024.tflite 3.65 GB TFLite model with INT8 quantization
tokenizer.model 4.5 MB SentencePiece tokenizer

Quantization Details

  • Scheme: Dynamic INT8
  • Weights: Quantized to INT8 (171 tensors)
  • Activations: FP32 (for accuracy)
  • KV Cache: Up to 1024 tokens
  • Verified: Weight quantization confirmed

Usage

MediaPipe Web (Easiest)

  1. Go to MediaPipe Studio
  2. Upload medgemma_1.5_4b.task
  3. Test with medical prompts

Android

import com.google.mediapipe.tasks.genai.llminference.LlmInference

val options = LlmInference.LlmInferenceOptions.builder()
    .setModelPath("/path/to/medgemma_1.5_4b.task")
    .setMaxTokens(512)
    .setTemperature(0.7f)
    .build()

val llm = LlmInference.createFromOptions(context, options)
val response = llm.generateResponse("What are the symptoms of diabetes?")

iOS

import MediaPipeTasksGenAI

let options = LlmInference.Options()
options.modelPath = "/path/to/medgemma_1.5_4b.task"
options.maxTokens = 512

let llm = try LlmInference(options: options)
let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")

Prompt Format

<start_of_turn>user
{YOUR_QUESTION}<end_of_turn>
<start_of_turn>model

Example Prompts

  • "What are the common symptoms of type 2 diabetes?"
  • "Explain the difference between systolic and diastolic blood pressure."
  • "What lifestyle changes can help manage hypertension?"

Performance

  • Inference Speed: ~10-40 tokens/sec on CPU
  • Memory Usage: ~5-6 GB RAM
  • Quantization Impact: Minimal accuracy degradation vs FP32

Limitations

  • Text-only: Vision encoder not included in this version
  • Medical disclaimer: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.

Conversion Process

Converted using ai-edge-torch:

  1. Downloaded from HuggingFace
  2. Converted to TFLite with INT8 quantization
  3. Bundled with MediaPipe task format

Citation

@misc{medgemma2024,
  title={MedGemma: Open medical large language models},
  author={Google DeepMind},
  year={2024},
  url={https://huggingface.co/google/medgemma-1.5-4b-it}
}

License

Apache 2.0 (same as base model)

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Teja321/medgemma-quantized

Finetuned
(26)
this model