2796gauravc
/

artha-functiongemma-270m-mediapipe

+---
+license: gemma
+base_model: google/functiongemma-270m-it
+tags:
+  - function-calling
+  - tflite
+  - mediapipe
+  - android
+  - on-device
+  - litert
+  - gemma3
+  - quantized
+language:
+  - en
+pipeline_tag: text-generation
+---
+# Artha AI — FunctionGemma 270M (MediaPipe .task)
+**Version**: 9.0.0 | **Format**: MediaPipe `.task` | **Size**: ~271 MB
+> ⚡ Ready-to-deploy Android model for on-device function calling.
+> Drop into your app's `assets/` folder and run with MediaPipe LLM Inference API.
+## Model Details
+| Property | Value |
+|---|---|
+| Base model | `google/functiongemma-270m-it` |
+| Fine-tuned weights | [`2796gauravc/artha-functiongemma-270m`](https://huggingface.co/2796gauravc/artha-functiongemma-270m) |
+| Format | MediaPipe `.task` (TFLite + SentencePiece tokenizer bundled) |
+| Quantization | dynamic_int8 (~271 MB) |
+| Prefill sequence length | 512 tokens |
+| KV cache max length | 1024 tokens |
+| Architecture | Gemma 3 270M |
+| Task | Structured function calling |
+## Files
+| File | Description |
+|---|---|
+| `artha_functiongemma_v9_0_0.task` | Primary — drop into Android `assets/` |
+| `tflite_raw/` | Raw TFLite files (for re-bundling or iOS use) |
+| `tflite_raw/tokenizer.model` | SentencePiece tokenizer |
+## ⚠️ Important: Prompt Template
+> **The `.task` bundle was built WITHOUT an embedded prompt prefix/suffix**
+> because the MediaPipe version available at build time (`mediapipe < 0.10.22`)
+> did not yet support the `prompt_prefix` parameter in `BundleConfig`.
+>
+> You MUST construct the full prompt manually in your Kotlin/Java code.
+> See the "Android Integration" section below.
+## Android Integration
+### 1. Add dependency
+```kotlin
+// build.gradle (app level)
+android {
+    aaptOptions { noCompress("task") }
+    defaultConfig { minSdk 24 }
+}
+dependencies {
+    implementation "com.google.mediapipe:tasks-genai:0.10.27"
+    // Use 0.10.27+ for best Gemma 3 support
+}
+```
+### 2. Push model to device (ADB — for testing)
+```bash
+adb shell mkdir -p /data/local/tmp/llm/
+adb push artha_functiongemma_v9_0_0.task /data/local/tmp/llm/
+```
+For production, use `assets/` folder (see Step 3 below).
+### 3. LlmInference setup (Kotlin)
+```kotlin
+import com.google.mediapipe.tasks.genai.llminference.LlmInference
+import com.google.mediapipe.tasks.genai.llminference.LlmInferenceSession
+class ArthaLlmManager(private val context: Context) {
+    private var llmInference: LlmInference? = null
+    private var session: LlmInferenceSession? = null
+    fun initialize() {
+        val modelPath = "/data/local/tmp/llm/artha_functiongemma_v9_0_0.task"
+        // OR from assets: context.getExternalFilesDir(null)?.absolutePath + "/model.task"
+        val options = LlmInference.LlmInferenceOptions.builder()
+            .setModelPath(modelPath)
+            .setMaxTokens(1024)   // must match KV cache max len
+            .setTopK(64)
+            .setTopP(0.95f)
+            .setTemperature(1.0f)               // FunctionGemma uses temp=1.0
+            .build()
+        llmInference = LlmInference.createFromOptions(context, options)
+        // Create a session for inference
+        val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder()
+            .setTopK(64)
+            .setTopP(0.95f)
+            .setTemperature(1.0f)
+            .build()
+        session = LlmInferenceSession.createFromLlmInference(llmInference!!, sessionOptions)
+    }
+    /**
+     * CRITICAL: Build the full FunctionGemma prompt manually.
+     * The .task bundle does NOT have an embedded prompt template.
+     *
+     * @param appName          e.g. "WhatsApp"
+     * @param notificationText e.g. "Call from Mom"
+     * @param systemPrompt     Your full system prompt with function declarations
+     */
+    fun buildPrompt(
+        appName: String,
+        notificationText: String,
+        systemPrompt: String
+    ): String = buildString {
+        // FunctionGemma uses a special chat format:
+        // developer role → functions declared here
+        // user role → actual notification
+        // model role → model completes here
+        append("<bos>")
+        append("<start_of_turn>developer\n")
+        append(systemPrompt)                     // Must include JSON function declarations
+        append("<end_of_turn>\n")
+        append("<start_of_turn>user\n")
+        append("$appName: $notificationText")
+        append("<end_of_turn>\n")
+        append("<start_of_turn>model\n")        // Model generates from here
+    }
+    /**
+     * Synchronous inference — run on a background thread/coroutine.
+     */
+    fun runInference(prompt: String): String {
+        return session?.generateResponse(prompt)
+            ?: llmInference?.generateResponse(prompt)
+            ?: ""
+    }
+    /**
+     * Streaming inference with a callback.
+     */
+    fun runInferenceStreaming(
+        prompt: String,
+        onPartialResult: (String) -> Unit,
+        onComplete: () -> Unit
+    ) {
+        session?.generateResponseAsync(prompt) { partial, done ->
+            onPartialResult(partial ?: "")
+            if (done) onComplete()
+        }
+    }
+    fun close() {
+        session?.close()
+        llmInference?.close()
+    }
+}
+```
+### 4. Parse FunctionGemma output
+FunctionGemma outputs in this format:
+```
+<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>
+```
+```kotlin
+object FunctionGemmaParser {
+    private val CALL_REGEX = Regex(
+        '''<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>''',
+        RegexOption.DOT_MATCHES_ALL
+    )
+    private val PARAM_REGEX = Regex('''(\w+):<escape>(.*?)<escape>''')
+    data class FunctionCall(val name: String, val params: Map<String, String>)
+    fun parse(output: String): List<FunctionCall> {
+        return CALL_REGEX.findAll(output).map { match ->
+            val functionName = match.groupValues[1]
+            val paramsStr    = match.groupValues[2]
+            val params       = PARAM_REGEX.findAll(paramsStr).associate {
+                it.groupValues[1] to it.groupValues[2]
+            }
+            FunctionCall(functionName, params)
+        }.toList()
+    }
+}
+```
+### 5. System prompt example (notification triage)
+```kotlin
+const val SYSTEM_PROMPT = """You are Artha, a notification triage assistant running on-device.
+You receive Android notification text and call the appropriate function.
+Available functions:
+[
+  {
+    "function": {
+      "name": "snooze_notification",
+      "description": "Snooze a notification for a given duration.",
+      "parameters": {
+        "type": "OBJECT",
+        "properties": {
+          "duration_minutes": {"type": "INTEGER", "description": "Minutes to snooze"},
+          "app": {"type": "STRING", "description": "App package name"}
+        },
+        "required": ["duration_minutes", "app"]
+      }
+    }
+  },
+  {
+    "function": {
+      "name": "mark_important",
+      "description": "Mark notification as important and show heads-up.",
+      "parameters": {
+        "type": "OBJECT",
+        "properties": {
+          "reason": {"type": "STRING", "description": "Why it is important"}
+        },
+        "required": ["reason"]
+      }
+    }
+  },
+  {
+    "function": {
+      "name": "dismiss_notification",
+      "description": "Silently dismiss the notification.",
+      "parameters": {"type": "OBJECT", "properties": {}}
+    }
+  }
+]
+"""
+```
+## Known Issues & Gotchas
+### 1. `prompt_prefix` missing from bundle
+**Problem**: Built with `mediapipe < 0.10.22` which didn't support `prompt_prefix`/`prompt_suffix` in `BundleConfig`. The `.task` has no embedded chat template.
+**Fix**: Construct the full prompt in Kotlin (see `buildPrompt()` above). This is actually more flexible.
+### 2. Fine-tuned model may lose `call:` prefix
+**Problem**: A known issue (github.com/google-gemini/gemma-cookbook/issues/273) — when fine-tuned on data formatted with `apply_chat_template`, the model sometimes drops the `call:` prefix from output, producing `<start_function_call>function_name{...}` instead of `<start_function_call>call:function_name{...}`.
+**Fix**: Update the parser regex to handle both formats:
+```kotlin
+private val CALL_REGEX = Regex(
+    '''<start_function_call>(?:call:)?(\w+)\{(.*?)\}<end_function_call>''',
+    RegexOption.DOT_MATCHES_ALL
+)
+```
+### 3. Device requirements
+MediaPipe LLM Inference API requires Android 7.0+ (SDK 24) and works best on devices with 6GB+ RAM (Pixel 8, Samsung S23+). The 270M model uses ~400-600 MB RAM at inference time.
+### 4. GPU backend
+By default LlmInference uses CPU (XNNPACK). For GPU acceleration add:
+```kotlin
+.setAcceleratorName("gpu")
+```
+But GPU may cause issues on some devices with int8 models — test before shipping.
+### 5. Session vs. stateless API
+Use `LlmInferenceSession` for multi-turn (maintains context), plain `LlmInference.generateResponse()` for single-shot notification triage (resets each time — fine for Artha's use case).
+## Source & Credits
+- Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
+- Training: Custom fine-tune on notification data
+- Conversion: litert-torch v0.8+ → mediapipe bundler
+- MediaPipe Android docs: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android