Artha AI — FunctionGemma 270M (MediaPipe .task)

Version: 9.0.0 | Format: MediaPipe .task | Size: ~271 MB

⚡ Ready-to-deploy Android model for on-device function calling. Drop into your app's assets/ folder and run with MediaPipe LLM Inference API.

Model Details

Property	Value
Base model	`google/functiongemma-270m-it`
Fine-tuned weights	`2796gauravc/artha-functiongemma-270m`
Format	MediaPipe `.task` (TFLite + SentencePiece tokenizer bundled)
Quantization	dynamic_int8 (~271 MB)
Prefill sequence length	512 tokens
KV cache max length	1024 tokens
Architecture	Gemma 3 270M
Task	Structured function calling

Files

File	Description
`artha_functiongemma_v9_0_0.task`	Primary — drop into Android `assets/`
`tflite_raw/`	Raw TFLite files (for re-bundling or iOS use)
`tflite_raw/tokenizer.model`	SentencePiece tokenizer

⚠️ Important: Prompt Template

The .task bundle was built WITHOUT an embedded prompt prefix/suffix because the MediaPipe version available at build time (mediapipe < 0.10.22) did not yet support the prompt_prefix parameter in BundleConfig.

You MUST construct the full prompt manually in your Kotlin/Java code. See the "Android Integration" section below.

Android Integration

1. Add dependency

// build.gradle (app level)
android {
    aaptOptions { noCompress("task") }
    defaultConfig { minSdk 24 }
}

dependencies {
    implementation "com.google.mediapipe:tasks-genai:0.10.27"
    // Use 0.10.27+ for best Gemma 3 support
}

2. Push model to device (ADB — for testing)

adb shell mkdir -p /data/local/tmp/llm/
adb push artha_functiongemma_v9_0_0.task /data/local/tmp/llm/

For production, use assets/ folder (see Step 3 below).

3. LlmInference setup (Kotlin)

import com.google.mediapipe.tasks.genai.llminference.LlmInference
import com.google.mediapipe.tasks.genai.llminference.LlmInferenceSession

class ArthaLlmManager(private val context: Context) {

    private var llmInference: LlmInference? = null
    private var session: LlmInferenceSession? = null

    fun initialize() {
        val modelPath = "/data/local/tmp/llm/artha_functiongemma_v9_0_0.task"
        // OR from assets: context.getExternalFilesDir(null)?.absolutePath + "/model.task"

        val options = LlmInference.LlmInferenceOptions.builder()
            .setModelPath(modelPath)
            .setMaxTokens(1024)   // must match KV cache max len
            .setTopK(64)
            .setTopP(0.95f)
            .setTemperature(1.0f)               // FunctionGemma uses temp=1.0
            .build()

        llmInference = LlmInference.createFromOptions(context, options)

        // Create a session for inference
        val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder()
            .setTopK(64)
            .setTopP(0.95f)
            .setTemperature(1.0f)
            .build()
        session = LlmInferenceSession.createFromLlmInference(llmInference!!, sessionOptions)
    }

    /**
     * CRITICAL: Build the full FunctionGemma prompt manually.
     * The .task bundle does NOT have an embedded prompt template.
     *
     * @param appName          e.g. "WhatsApp"
     * @param notificationText e.g. "Call from Mom"
     * @param systemPrompt     Your full system prompt with function declarations
     */
    fun buildPrompt(
        appName: String,
        notificationText: String,
        systemPrompt: String
    ): String = buildString {
        // FunctionGemma uses a special chat format:
        // developer role → functions declared here
        // user role → actual notification
        // model role → model completes here
        append("<bos>")
        append("<start_of_turn>developer\n")
        append(systemPrompt)                     // Must include JSON function declarations
        append("<end_of_turn>\n")
        append("<start_of_turn>user\n")
        append("$appName: $notificationText")
        append("<end_of_turn>\n")
        append("<start_of_turn>model\n")        // Model generates from here
    }

    /**
     * Synchronous inference — run on a background thread/coroutine.
     */
    fun runInference(prompt: String): String {
        return session?.generateResponse(prompt)
            ?: llmInference?.generateResponse(prompt)
            ?: ""
    }

    /**
     * Streaming inference with a callback.
     */
    fun runInferenceStreaming(
        prompt: String,
        onPartialResult: (String) -> Unit,
        onComplete: () -> Unit
    ) {
        session?.generateResponseAsync(prompt) { partial, done ->
            onPartialResult(partial ?: "")
            if (done) onComplete()
        }
    }

    fun close() {
        session?.close()
        llmInference?.close()
    }
}

4. Parse FunctionGemma output

FunctionGemma outputs in this format:

<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>

object FunctionGemmaParser {

    private val CALL_REGEX = Regex(
        '''<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>''',
        RegexOption.DOT_MATCHES_ALL
    )
    private val PARAM_REGEX = Regex('''(\w+):<escape>(.*?)<escape>''')

    data class FunctionCall(val name: String, val params: Map<String, String>)

    fun parse(output: String): List<FunctionCall> {
        return CALL_REGEX.findAll(output).map { match ->
            val functionName = match.groupValues[1]
            val paramsStr    = match.groupValues[2]
            val params       = PARAM_REGEX.findAll(paramsStr).associate {
                it.groupValues[1] to it.groupValues[2]
            }
            FunctionCall(functionName, params)
        }.toList()
    }
}

5. System prompt example (notification triage)

const val SYSTEM_PROMPT = """You are Artha, a notification triage assistant running on-device.
You receive Android notification text and call the appropriate function.

Available functions:
[
  {
    "function": {
      "name": "snooze_notification",
      "description": "Snooze a notification for a given duration.",
      "parameters": {
        "type": "OBJECT",
        "properties": {
          "duration_minutes": {"type": "INTEGER", "description": "Minutes to snooze"},
          "app": {"type": "STRING", "description": "App package name"}
        },
        "required": ["duration_minutes", "app"]
      }
    }
  },
  {
    "function": {
      "name": "mark_important",
      "description": "Mark notification as important and show heads-up.",
      "parameters": {
        "type": "OBJECT",
        "properties": {
          "reason": {"type": "STRING", "description": "Why it is important"}
        },
        "required": ["reason"]
      }
    }
  },
  {
    "function": {
      "name": "dismiss_notification",
      "description": "Silently dismiss the notification.",
      "parameters": {"type": "OBJECT", "properties": {}}
    }
  }
]
"""

Known Issues & Gotchas

1. `prompt_prefix` missing from bundle

Problem: Built with mediapipe < 0.10.22 which didn't support prompt_prefix/prompt_suffix in BundleConfig. The .task has no embedded chat template. Fix: Construct the full prompt in Kotlin (see buildPrompt() above). This is actually more flexible.

2. Fine-tuned model may lose `call:` prefix

Problem: A known issue (github.com/google-gemini/gemma-cookbook/issues/273) — when fine-tuned on data formatted with apply_chat_template, the model sometimes drops the call: prefix from output, producing <start_function_call>function_name{...} instead of <start_function_call>call:function_name{...}. Fix: Update the parser regex to handle both formats:

private val CALL_REGEX = Regex(
    '''<start_function_call>(?:call:)?(\w+)\{(.*?)\}<end_function_call>''',
    RegexOption.DOT_MATCHES_ALL
)

3. Device requirements

MediaPipe LLM Inference API requires Android 7.0+ (SDK 24) and works best on devices with 6GB+ RAM (Pixel 8, Samsung S23+). The 270M model uses ~400-600 MB RAM at inference time.

4. GPU backend

By default LlmInference uses CPU (XNNPACK). For GPU acceleration add:

.setAcceleratorName("gpu")

But GPU may cause issues on some devices with int8 models — test before shipping.

5. Session vs. stateless API

Use LlmInferenceSession for multi-turn (maintains context), plain LlmInference.generateResponse() for single-shot notification triage (resets each time — fine for Artha's use case).

Source & Credits

Base model: google/functiongemma-270m-it
Training: Custom fine-tune on notification data
Conversion: litert-torch v0.8+ → mediapipe bundler
MediaPipe Android docs: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android

Downloads last month: 14

Model tree for 2796gauravc/artha-functiongemma-270m-mediapipe

Base model

google/functiongemma-270m-it

Finetuned

(427)

this model