Artha AI β€” FunctionGemma 270M (MediaPipe .task)

Version: 9.0.0 | Format: MediaPipe .task | Size: ~271 MB

⚑ Ready-to-deploy Android model for on-device function calling. Drop into your app's assets/ folder and run with MediaPipe LLM Inference API.

Model Details

Property Value
Base model google/functiongemma-270m-it
Fine-tuned weights 2796gauravc/artha-functiongemma-270m
Format MediaPipe .task (TFLite + SentencePiece tokenizer bundled)
Quantization dynamic_int8 (~271 MB)
Prefill sequence length 512 tokens
KV cache max length 1024 tokens
Architecture Gemma 3 270M
Task Structured function calling

Files

File Description
artha_functiongemma_v9_0_0.task Primary β€” drop into Android assets/
tflite_raw/ Raw TFLite files (for re-bundling or iOS use)
tflite_raw/tokenizer.model SentencePiece tokenizer

⚠️ Important: Prompt Template

The .task bundle was built WITHOUT an embedded prompt prefix/suffix because the MediaPipe version available at build time (mediapipe < 0.10.22) did not yet support the prompt_prefix parameter in BundleConfig.

You MUST construct the full prompt manually in your Kotlin/Java code. See the "Android Integration" section below.

Android Integration

1. Add dependency

// build.gradle (app level)
android {
    aaptOptions { noCompress("task") }
    defaultConfig { minSdk 24 }
}

dependencies {
    implementation "com.google.mediapipe:tasks-genai:0.10.27"
    // Use 0.10.27+ for best Gemma 3 support
}

2. Push model to device (ADB β€” for testing)

adb shell mkdir -p /data/local/tmp/llm/
adb push artha_functiongemma_v9_0_0.task /data/local/tmp/llm/

For production, use assets/ folder (see Step 3 below).

3. LlmInference setup (Kotlin)

import com.google.mediapipe.tasks.genai.llminference.LlmInference
import com.google.mediapipe.tasks.genai.llminference.LlmInferenceSession

class ArthaLlmManager(private val context: Context) {

    private var llmInference: LlmInference? = null
    private var session: LlmInferenceSession? = null

    fun initialize() {
        val modelPath = "/data/local/tmp/llm/artha_functiongemma_v9_0_0.task"
        // OR from assets: context.getExternalFilesDir(null)?.absolutePath + "/model.task"

        val options = LlmInference.LlmInferenceOptions.builder()
            .setModelPath(modelPath)
            .setMaxTokens(1024)   // must match KV cache max len
            .setTopK(64)
            .setTopP(0.95f)
            .setTemperature(1.0f)               // FunctionGemma uses temp=1.0
            .build()

        llmInference = LlmInference.createFromOptions(context, options)

        // Create a session for inference
        val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder()
            .setTopK(64)
            .setTopP(0.95f)
            .setTemperature(1.0f)
            .build()
        session = LlmInferenceSession.createFromLlmInference(llmInference!!, sessionOptions)
    }

    /**
     * CRITICAL: Build the full FunctionGemma prompt manually.
     * The .task bundle does NOT have an embedded prompt template.
     *
     * @param appName          e.g. "WhatsApp"
     * @param notificationText e.g. "Call from Mom"
     * @param systemPrompt     Your full system prompt with function declarations
     */
    fun buildPrompt(
        appName: String,
        notificationText: String,
        systemPrompt: String
    ): String = buildString {
        // FunctionGemma uses a special chat format:
        // developer role β†’ functions declared here
        // user role β†’ actual notification
        // model role β†’ model completes here
        append("<bos>")
        append("<start_of_turn>developer\n")
        append(systemPrompt)                     // Must include JSON function declarations
        append("<end_of_turn>\n")
        append("<start_of_turn>user\n")
        append("$appName: $notificationText")
        append("<end_of_turn>\n")
        append("<start_of_turn>model\n")        // Model generates from here
    }

    /**
     * Synchronous inference β€” run on a background thread/coroutine.
     */
    fun runInference(prompt: String): String {
        return session?.generateResponse(prompt)
            ?: llmInference?.generateResponse(prompt)
            ?: ""
    }

    /**
     * Streaming inference with a callback.
     */
    fun runInferenceStreaming(
        prompt: String,
        onPartialResult: (String) -> Unit,
        onComplete: () -> Unit
    ) {
        session?.generateResponseAsync(prompt) { partial, done ->
            onPartialResult(partial ?: "")
            if (done) onComplete()
        }
    }

    fun close() {
        session?.close()
        llmInference?.close()
    }
}

4. Parse FunctionGemma output

FunctionGemma outputs in this format:

<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>
object FunctionGemmaParser {

    private val CALL_REGEX = Regex(
        '''<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>''',
        RegexOption.DOT_MATCHES_ALL
    )
    private val PARAM_REGEX = Regex('''(\w+):<escape>(.*?)<escape>''')

    data class FunctionCall(val name: String, val params: Map<String, String>)

    fun parse(output: String): List<FunctionCall> {
        return CALL_REGEX.findAll(output).map { match ->
            val functionName = match.groupValues[1]
            val paramsStr    = match.groupValues[2]
            val params       = PARAM_REGEX.findAll(paramsStr).associate {
                it.groupValues[1] to it.groupValues[2]
            }
            FunctionCall(functionName, params)
        }.toList()
    }
}

5. System prompt example (notification triage)

const val SYSTEM_PROMPT = """You are Artha, a notification triage assistant running on-device.
You receive Android notification text and call the appropriate function.

Available functions:
[
  {
    "function": {
      "name": "snooze_notification",
      "description": "Snooze a notification for a given duration.",
      "parameters": {
        "type": "OBJECT",
        "properties": {
          "duration_minutes": {"type": "INTEGER", "description": "Minutes to snooze"},
          "app": {"type": "STRING", "description": "App package name"}
        },
        "required": ["duration_minutes", "app"]
      }
    }
  },
  {
    "function": {
      "name": "mark_important",
      "description": "Mark notification as important and show heads-up.",
      "parameters": {
        "type": "OBJECT",
        "properties": {
          "reason": {"type": "STRING", "description": "Why it is important"}
        },
        "required": ["reason"]
      }
    }
  },
  {
    "function": {
      "name": "dismiss_notification",
      "description": "Silently dismiss the notification.",
      "parameters": {"type": "OBJECT", "properties": {}}
    }
  }
]
"""

Known Issues & Gotchas

1. prompt_prefix missing from bundle

Problem: Built with mediapipe < 0.10.22 which didn't support prompt_prefix/prompt_suffix in BundleConfig. The .task has no embedded chat template. Fix: Construct the full prompt in Kotlin (see buildPrompt() above). This is actually more flexible.

2. Fine-tuned model may lose call: prefix

Problem: A known issue (github.com/google-gemini/gemma-cookbook/issues/273) β€” when fine-tuned on data formatted with apply_chat_template, the model sometimes drops the call: prefix from output, producing <start_function_call>function_name{...} instead of <start_function_call>call:function_name{...}. Fix: Update the parser regex to handle both formats:

private val CALL_REGEX = Regex(
    '''<start_function_call>(?:call:)?(\w+)\{(.*?)\}<end_function_call>''',
    RegexOption.DOT_MATCHES_ALL
)

3. Device requirements

MediaPipe LLM Inference API requires Android 7.0+ (SDK 24) and works best on devices with 6GB+ RAM (Pixel 8, Samsung S23+). The 270M model uses ~400-600 MB RAM at inference time.

4. GPU backend

By default LlmInference uses CPU (XNNPACK). For GPU acceleration add:

.setAcceleratorName("gpu")

But GPU may cause issues on some devices with int8 models β€” test before shipping.

5. Session vs. stateless API

Use LlmInferenceSession for multi-turn (maintains context), plain LlmInference.generateResponse() for single-shot notification triage (resets each time β€” fine for Artha's use case).

Source & Credits

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for 2796gauravc/artha-functiongemma-270m-mediapipe

Finetuned
(348)
this model