2796gauravc commited on
Commit
d3fa082
·
verified ·
1 Parent(s): 2c1739e

Add model card

Browse files
Files changed (1) hide show
  1. README.md +283 -0
README.md ADDED
@@ -0,0 +1,283 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ base_model: google/functiongemma-270m-it
4
+ tags:
5
+ - function-calling
6
+ - tflite
7
+ - mediapipe
8
+ - android
9
+ - on-device
10
+ - litert
11
+ - gemma3
12
+ - quantized
13
+ language:
14
+ - en
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # Artha AI — FunctionGemma 270M (MediaPipe .task)
19
+
20
+ **Version**: 9.0.0 | **Format**: MediaPipe `.task` | **Size**: ~271 MB
21
+
22
+ > ⚡ Ready-to-deploy Android model for on-device function calling.
23
+ > Drop into your app's `assets/` folder and run with MediaPipe LLM Inference API.
24
+
25
+ ## Model Details
26
+
27
+ | Property | Value |
28
+ |---|---|
29
+ | Base model | `google/functiongemma-270m-it` |
30
+ | Fine-tuned weights | [`2796gauravc/artha-functiongemma-270m`](https://huggingface.co/2796gauravc/artha-functiongemma-270m) |
31
+ | Format | MediaPipe `.task` (TFLite + SentencePiece tokenizer bundled) |
32
+ | Quantization | dynamic_int8 (~271 MB) |
33
+ | Prefill sequence length | 512 tokens |
34
+ | KV cache max length | 1024 tokens |
35
+ | Architecture | Gemma 3 270M |
36
+ | Task | Structured function calling |
37
+
38
+ ## Files
39
+
40
+ | File | Description |
41
+ |---|---|
42
+ | `artha_functiongemma_v9_0_0.task` | Primary — drop into Android `assets/` |
43
+ | `tflite_raw/` | Raw TFLite files (for re-bundling or iOS use) |
44
+ | `tflite_raw/tokenizer.model` | SentencePiece tokenizer |
45
+
46
+ ## ⚠️ Important: Prompt Template
47
+
48
+ > **The `.task` bundle was built WITHOUT an embedded prompt prefix/suffix**
49
+ > because the MediaPipe version available at build time (`mediapipe < 0.10.22`)
50
+ > did not yet support the `prompt_prefix` parameter in `BundleConfig`.
51
+ >
52
+ > You MUST construct the full prompt manually in your Kotlin/Java code.
53
+ > See the "Android Integration" section below.
54
+
55
+ ## Android Integration
56
+
57
+ ### 1. Add dependency
58
+
59
+ ```kotlin
60
+ // build.gradle (app level)
61
+ android {
62
+ aaptOptions { noCompress("task") }
63
+ defaultConfig { minSdk 24 }
64
+ }
65
+
66
+ dependencies {
67
+ implementation "com.google.mediapipe:tasks-genai:0.10.27"
68
+ // Use 0.10.27+ for best Gemma 3 support
69
+ }
70
+ ```
71
+
72
+ ### 2. Push model to device (ADB — for testing)
73
+
74
+ ```bash
75
+ adb shell mkdir -p /data/local/tmp/llm/
76
+ adb push artha_functiongemma_v9_0_0.task /data/local/tmp/llm/
77
+ ```
78
+
79
+ For production, use `assets/` folder (see Step 3 below).
80
+
81
+ ### 3. LlmInference setup (Kotlin)
82
+
83
+ ```kotlin
84
+ import com.google.mediapipe.tasks.genai.llminference.LlmInference
85
+ import com.google.mediapipe.tasks.genai.llminference.LlmInferenceSession
86
+
87
+ class ArthaLlmManager(private val context: Context) {
88
+
89
+ private var llmInference: LlmInference? = null
90
+ private var session: LlmInferenceSession? = null
91
+
92
+ fun initialize() {
93
+ val modelPath = "/data/local/tmp/llm/artha_functiongemma_v9_0_0.task"
94
+ // OR from assets: context.getExternalFilesDir(null)?.absolutePath + "/model.task"
95
+
96
+ val options = LlmInference.LlmInferenceOptions.builder()
97
+ .setModelPath(modelPath)
98
+ .setMaxTokens(1024) // must match KV cache max len
99
+ .setTopK(64)
100
+ .setTopP(0.95f)
101
+ .setTemperature(1.0f) // FunctionGemma uses temp=1.0
102
+ .build()
103
+
104
+ llmInference = LlmInference.createFromOptions(context, options)
105
+
106
+ // Create a session for inference
107
+ val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder()
108
+ .setTopK(64)
109
+ .setTopP(0.95f)
110
+ .setTemperature(1.0f)
111
+ .build()
112
+ session = LlmInferenceSession.createFromLlmInference(llmInference!!, sessionOptions)
113
+ }
114
+
115
+ /**
116
+ * CRITICAL: Build the full FunctionGemma prompt manually.
117
+ * The .task bundle does NOT have an embedded prompt template.
118
+ *
119
+ * @param appName e.g. "WhatsApp"
120
+ * @param notificationText e.g. "Call from Mom"
121
+ * @param systemPrompt Your full system prompt with function declarations
122
+ */
123
+ fun buildPrompt(
124
+ appName: String,
125
+ notificationText: String,
126
+ systemPrompt: String
127
+ ): String = buildString {
128
+ // FunctionGemma uses a special chat format:
129
+ // developer role → functions declared here
130
+ // user role → actual notification
131
+ // model role → model completes here
132
+ append("<bos>")
133
+ append("<start_of_turn>developer\n")
134
+ append(systemPrompt) // Must include JSON function declarations
135
+ append("<end_of_turn>\n")
136
+ append("<start_of_turn>user\n")
137
+ append("$appName: $notificationText")
138
+ append("<end_of_turn>\n")
139
+ append("<start_of_turn>model\n") // Model generates from here
140
+ }
141
+
142
+ /**
143
+ * Synchronous inference — run on a background thread/coroutine.
144
+ */
145
+ fun runInference(prompt: String): String {
146
+ return session?.generateResponse(prompt)
147
+ ?: llmInference?.generateResponse(prompt)
148
+ ?: ""
149
+ }
150
+
151
+ /**
152
+ * Streaming inference with a callback.
153
+ */
154
+ fun runInferenceStreaming(
155
+ prompt: String,
156
+ onPartialResult: (String) -> Unit,
157
+ onComplete: () -> Unit
158
+ ) {
159
+ session?.generateResponseAsync(prompt) { partial, done ->
160
+ onPartialResult(partial ?: "")
161
+ if (done) onComplete()
162
+ }
163
+ }
164
+
165
+ fun close() {
166
+ session?.close()
167
+ llmInference?.close()
168
+ }
169
+ }
170
+ ```
171
+
172
+ ### 4. Parse FunctionGemma output
173
+
174
+ FunctionGemma outputs in this format:
175
+ ```
176
+ <start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>
177
+ ```
178
+
179
+ ```kotlin
180
+ object FunctionGemmaParser {
181
+
182
+ private val CALL_REGEX = Regex(
183
+ '''<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>''',
184
+ RegexOption.DOT_MATCHES_ALL
185
+ )
186
+ private val PARAM_REGEX = Regex('''(\w+):<escape>(.*?)<escape>''')
187
+
188
+ data class FunctionCall(val name: String, val params: Map<String, String>)
189
+
190
+ fun parse(output: String): List<FunctionCall> {
191
+ return CALL_REGEX.findAll(output).map { match ->
192
+ val functionName = match.groupValues[1]
193
+ val paramsStr = match.groupValues[2]
194
+ val params = PARAM_REGEX.findAll(paramsStr).associate {
195
+ it.groupValues[1] to it.groupValues[2]
196
+ }
197
+ FunctionCall(functionName, params)
198
+ }.toList()
199
+ }
200
+ }
201
+ ```
202
+
203
+ ### 5. System prompt example (notification triage)
204
+
205
+ ```kotlin
206
+ const val SYSTEM_PROMPT = """You are Artha, a notification triage assistant running on-device.
207
+ You receive Android notification text and call the appropriate function.
208
+
209
+ Available functions:
210
+ [
211
+ {
212
+ "function": {
213
+ "name": "snooze_notification",
214
+ "description": "Snooze a notification for a given duration.",
215
+ "parameters": {
216
+ "type": "OBJECT",
217
+ "properties": {
218
+ "duration_minutes": {"type": "INTEGER", "description": "Minutes to snooze"},
219
+ "app": {"type": "STRING", "description": "App package name"}
220
+ },
221
+ "required": ["duration_minutes", "app"]
222
+ }
223
+ }
224
+ },
225
+ {
226
+ "function": {
227
+ "name": "mark_important",
228
+ "description": "Mark notification as important and show heads-up.",
229
+ "parameters": {
230
+ "type": "OBJECT",
231
+ "properties": {
232
+ "reason": {"type": "STRING", "description": "Why it is important"}
233
+ },
234
+ "required": ["reason"]
235
+ }
236
+ }
237
+ },
238
+ {
239
+ "function": {
240
+ "name": "dismiss_notification",
241
+ "description": "Silently dismiss the notification.",
242
+ "parameters": {"type": "OBJECT", "properties": {}}
243
+ }
244
+ }
245
+ ]
246
+ """
247
+ ```
248
+
249
+ ## Known Issues & Gotchas
250
+
251
+ ### 1. `prompt_prefix` missing from bundle
252
+ **Problem**: Built with `mediapipe < 0.10.22` which didn't support `prompt_prefix`/`prompt_suffix` in `BundleConfig`. The `.task` has no embedded chat template.
253
+ **Fix**: Construct the full prompt in Kotlin (see `buildPrompt()` above). This is actually more flexible.
254
+
255
+ ### 2. Fine-tuned model may lose `call:` prefix
256
+ **Problem**: A known issue (github.com/google-gemini/gemma-cookbook/issues/273) — when fine-tuned on data formatted with `apply_chat_template`, the model sometimes drops the `call:` prefix from output, producing `<start_function_call>function_name{...}` instead of `<start_function_call>call:function_name{...}`.
257
+ **Fix**: Update the parser regex to handle both formats:
258
+ ```kotlin
259
+ private val CALL_REGEX = Regex(
260
+ '''<start_function_call>(?:call:)?(\w+)\{(.*?)\}<end_function_call>''',
261
+ RegexOption.DOT_MATCHES_ALL
262
+ )
263
+ ```
264
+
265
+ ### 3. Device requirements
266
+ MediaPipe LLM Inference API requires Android 7.0+ (SDK 24) and works best on devices with 6GB+ RAM (Pixel 8, Samsung S23+). The 270M model uses ~400-600 MB RAM at inference time.
267
+
268
+ ### 4. GPU backend
269
+ By default LlmInference uses CPU (XNNPACK). For GPU acceleration add:
270
+ ```kotlin
271
+ .setAcceleratorName("gpu")
272
+ ```
273
+ But GPU may cause issues on some devices with int8 models — test before shipping.
274
+
275
+ ### 5. Session vs. stateless API
276
+ Use `LlmInferenceSession` for multi-turn (maintains context), plain `LlmInference.generateResponse()` for single-shot notification triage (resets each time — fine for Artha's use case).
277
+
278
+ ## Source & Credits
279
+
280
+ - Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
281
+ - Training: Custom fine-tune on notification data
282
+ - Conversion: litert-torch v0.8+ → mediapipe bundler
283
+ - MediaPipe Android docs: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android