MedGemma 1.5 4B - Quantized (INT8)

This is a quantized version of google/medgemma-1.5-4b-it optimized for on-device deployment using TensorFlow Lite and MediaPipe.

Model Details

Base Model: MedGemma 1.5 4B (Instruction-Tuned)
Quantization: INT8 Dynamic Quantization
Model Size: 3.65 GB (4x reduction from FP32)
Architecture: Gemma 3
Deployment: MediaPipe Task Bundle + TFLite

Files

File	Size	Description
`medgemma_1.5_4b.task`	3.65 GB	MediaPipe task bundle (ready to use)
`gemma3-4b_q8_ekv1024.tflite`	3.65 GB	TFLite model with INT8 quantization
`tokenizer.model`	4.5 MB	SentencePiece tokenizer

Quantization Details

Scheme: Dynamic INT8
Weights: Quantized to INT8 (171 tensors)
Activations: FP32 (for accuracy)
KV Cache: Up to 1024 tokens
Verified: Weight quantization confirmed

Usage

MediaPipe Web (Easiest)

Go to MediaPipe Studio
Upload medgemma_1.5_4b.task
Test with medical prompts

Android

import com.google.mediapipe.tasks.genai.llminference.LlmInference

val options = LlmInference.LlmInferenceOptions.builder()
    .setModelPath("/path/to/medgemma_1.5_4b.task")
    .setMaxTokens(512)
    .setTemperature(0.7f)
    .build()

val llm = LlmInference.createFromOptions(context, options)
val response = llm.generateResponse("What are the symptoms of diabetes?")

iOS

import MediaPipeTasksGenAI

let options = LlmInference.Options()
options.modelPath = "/path/to/medgemma_1.5_4b.task"
options.maxTokens = 512

let llm = try LlmInference(options: options)
let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")

Prompt Format

<start_of_turn>user
{YOUR_QUESTION}<end_of_turn>
<start_of_turn>model

Example Prompts

"What are the common symptoms of type 2 diabetes?"
"Explain the difference between systolic and diastolic blood pressure."
"What lifestyle changes can help manage hypertension?"

Performance

Inference Speed: ~10-40 tokens/sec on CPU
Memory Usage: ~5-6 GB RAM
Quantization Impact: Minimal accuracy degradation vs FP32

Limitations

Text-only: Vision encoder not included in this version
Medical disclaimer: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.

Conversion Process

Converted using ai-edge-torch:

Downloaded from HuggingFace
Converted to TFLite with INT8 quantization
Bundled with MediaPipe task format

Citation

@misc{medgemma2024,
  title={MedGemma: Open medical large language models},
  author={Google DeepMind},
  year={2024},
  url={https://huggingface.co/google/medgemma-1.5-4b-it}
}

License

Apache 2.0 (same as base model)

Downloads last month: 12

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Teja321/medgemma-quantized

Base model

google/medgemma-1.5-4b-it

Finetuned

(68)

this model