Teja321 commited on
Commit
55131ff
·
verified ·
1 Parent(s): 6bc3fc3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +124 -0
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mediapipe
3
+ tags:
4
+ - medical
5
+ - llm
6
+ - gemma
7
+ - quantized
8
+ - tflite
9
+ - int8
10
+ license: apache-2.0
11
+ base_model: google/medgemma-1.5-4b-it
12
+ ---
13
+
14
+ # MedGemma 1.5 4B - Quantized (INT8)
15
+
16
+ This is a quantized version of [google/medgemma-1.5-4b-it](https://huggingface.co/google/medgemma-1.5-4b-it) optimized for on-device deployment using TensorFlow Lite and MediaPipe.
17
+
18
+ ## Model Details
19
+
20
+ - **Base Model**: MedGemma 1.5 4B (Instruction-Tuned)
21
+ - **Quantization**: INT8 Dynamic Quantization
22
+ - **Model Size**: 3.65 GB (4x reduction from FP32)
23
+ - **Architecture**: Gemma 3
24
+ - **Deployment**: MediaPipe Task Bundle + TFLite
25
+
26
+ ## Files
27
+
28
+ | File | Size | Description |
29
+ |------|------|-------------|
30
+ | `medgemma_1.5_4b.task` | 3.65 GB | MediaPipe task bundle (ready to use) |
31
+ | `gemma3-4b_q8_ekv1024.tflite` | 3.65 GB | TFLite model with INT8 quantization |
32
+ | `tokenizer.model` | 4.5 MB | SentencePiece tokenizer |
33
+
34
+ ## Quantization Details
35
+
36
+ - **Scheme**: Dynamic INT8
37
+ - **Weights**: Quantized to INT8 (171 tensors)
38
+ - **Activations**: FP32 (for accuracy)
39
+ - **KV Cache**: Up to 1024 tokens
40
+ - **Verified**: Weight quantization confirmed
41
+
42
+ ## Usage
43
+
44
+ ### MediaPipe Web (Easiest)
45
+
46
+ 1. Go to [MediaPipe Studio](https://mediapipe-studio.webapps.google.com/demo/llm_inference)
47
+ 2. Upload `medgemma_1.5_4b.task`
48
+ 3. Test with medical prompts
49
+
50
+ ### Android
51
+
52
+ ```kotlin
53
+ import com.google.mediapipe.tasks.genai.llminference.LlmInference
54
+
55
+ val options = LlmInference.LlmInferenceOptions.builder()
56
+ .setModelPath("/path/to/medgemma_1.5_4b.task")
57
+ .setMaxTokens(512)
58
+ .setTemperature(0.7f)
59
+ .build()
60
+
61
+ val llm = LlmInference.createFromOptions(context, options)
62
+ val response = llm.generateResponse("What are the symptoms of diabetes?")
63
+ ```
64
+
65
+ ### iOS
66
+
67
+ ```swift
68
+ import MediaPipeTasksGenAI
69
+
70
+ let options = LlmInference.Options()
71
+ options.modelPath = "/path/to/medgemma_1.5_4b.task"
72
+ options.maxTokens = 512
73
+
74
+ let llm = try LlmInference(options: options)
75
+ let response = try llm.generateResponse(prompt: "What are the symptoms of diabetes?")
76
+ ```
77
+
78
+ ## Prompt Format
79
+
80
+ ```
81
+ <start_of_turn>user
82
+ {YOUR_QUESTION}<end_of_turn>
83
+ <start_of_turn>model
84
+
85
+ ```
86
+
87
+ ## Example Prompts
88
+
89
+ - "What are the common symptoms of type 2 diabetes?"
90
+ - "Explain the difference between systolic and diastolic blood pressure."
91
+ - "What lifestyle changes can help manage hypertension?"
92
+
93
+ ## Performance
94
+
95
+ - **Inference Speed**: ~10-40 tokens/sec on CPU
96
+ - **Memory Usage**: ~5-6 GB RAM
97
+ - **Quantization Impact**: Minimal accuracy degradation vs FP32
98
+
99
+ ## Limitations
100
+
101
+ - **Text-only**: Vision encoder not included in this version
102
+ - **Medical disclaimer**: This model is for educational/research purposes only. Always consult healthcare professionals for medical advice.
103
+
104
+ ## Conversion Process
105
+
106
+ Converted using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch):
107
+ 1. Downloaded from HuggingFace
108
+ 2. Converted to TFLite with INT8 quantization
109
+ 3. Bundled with MediaPipe task format
110
+
111
+ ## Citation
112
+
113
+ ```bibtex
114
+ @misc{medgemma2024,
115
+ title={MedGemma: Open medical large language models},
116
+ author={Google DeepMind},
117
+ year={2024},
118
+ url={https://huggingface.co/google/medgemma-1.5-4b-it}
119
+ }
120
+ ```
121
+
122
+ ## License
123
+
124
+ Apache 2.0 (same as base model)