Teja321
/

medgemma-quantized

Model card Files Files and versions

medgemma-quantized / README.md

Teja321's picture

Upload README.md with huggingface_hub

55131ff verified 5 days ago

|

history blame contribute delete

3.18 kB

	---
	library_name: mediapipe
	tags:
	- medical
	- llm
	- gemma
	- quantized
	- tflite
	- int8
	license: apache-2.0
	base_model: google/medgemma-1.5-4b-it
	---

	# MedGemma 1.5 4B - Quantized (INT8)

	This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe.

	## Model Details

	- Base Model: MedGemma 1.5 4B (Instruction-Tuned)
	- Quantization: INT8 Dynamic Quantization
	- Model Size: 3.65 GB (4x reduction from FP32)
	- Architecture: Gemma 3
	- Deployment: MediaPipe Task Bundle + TFLite

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `medgemma_1.5_4b.task` \| 3.65 GB \| MediaPipe task bundle (ready to use) \|
	\| `gemma3-4b_q8_ekv1024.tflite` \| 3.65 GB \| TFLite model with INT8 quantization \|
	\| `tokenizer.model` \| 4.5 MB \| SentencePiece tokenizer \|

	## Quantization Details

	- Scheme: Dynamic INT8
	- Weights: Quantized to INT8 (171 tensors)
	- Activations: FP32 (for accuracy)
	- KV Cache: Up to 1024 tokens
	- Verified: Weight quantization confirmed

	## Usage

	### MediaPipe Web (Easiest)

	1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference)
	2. Upload `medgemma_1.5_4b.task`
	3. Test with medical prompts

	### Android

	```kotlin
	import com.google.mediapipe.tasks.genai.llminference.LlmInference

	val options = LlmInference.LlmInferenceOptions.builder()
	.setModelPath("/path/to/medgemma_1.5_4b.task")
	.setMaxTokens(512)
	.setTemperature(0.7f)
	.build()

	val llm = LlmInference.createFromOptions(context, options)
	val response = llm.generateResponse("What are the symptoms of diabetes?")
	```

	### iOS

	```swift
	import MediaPipeTasksGenAI

	let options = LlmInference.Options()
	options.modelPath = "/path/to/medgemma_1.5_4b.task"
	options.maxTokens = 512

	let llm = try LlmInference(options: options)
	let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")
	```

	## Prompt Format

	```
	<start_of_turn>user
	{YOUR_QUESTION}<end_of_turn>
	<start_of_turn>model

	```

	## Example Prompts

	- "What are the common symptoms of type 2 diabetes?"
	- "Explain the difference between systolic and diastolic blood pressure."
	- "What lifestyle changes can help manage hypertension?"

	## Performance

	- Inference Speed: ~10-40 tokens/sec on CPU
	- Memory Usage: ~5-6 GB RAM
	- Quantization Impact: Minimal accuracy degradation vs FP32

	## Limitations

	- Text-only: Vision encoder not included in this version
	- Medical disclaimer: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.

	## Conversion Process

	Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch):
	1. Downloaded from HuggingFace
	2. Converted to TFLite with INT8 quantization
	3. Bundled with MediaPipe task format

	## Citation

	```bibtex
	@misc{medgemma2024,
	title={MedGemma: Open medical large language models},
	author={Google DeepMind},
	year={2024},
	url={https://huggingface.co/google/medgemma-1.5-4b-it}
	}
	```

	## License

	Apache 2.0 (same as base model)