Artha AI β FunctionGemma 270M (MediaPipe .task)
Version: 9.0.0 | Format: MediaPipe .task | Size: ~271 MB
β‘ Ready-to-deploy Android model for on-device function calling. Drop into your app's
assets/folder and run with MediaPipe LLM Inference API.
Model Details
| Property | Value |
|---|---|
| Base model | google/functiongemma-270m-it |
| Fine-tuned weights | 2796gauravc/artha-functiongemma-270m |
| Format | MediaPipe .task (TFLite + SentencePiece tokenizer bundled) |
| Quantization | dynamic_int8 (~271 MB) |
| Prefill sequence length | 512 tokens |
| KV cache max length | 1024 tokens |
| Architecture | Gemma 3 270M |
| Task | Structured function calling |
Files
| File | Description |
|---|---|
artha_functiongemma_v9_0_0.task |
Primary β drop into Android assets/ |
tflite_raw/ |
Raw TFLite files (for re-bundling or iOS use) |
tflite_raw/tokenizer.model |
SentencePiece tokenizer |
β οΈ Important: Prompt Template
The
.taskbundle was built WITHOUT an embedded prompt prefix/suffix because the MediaPipe version available at build time (mediapipe < 0.10.22) did not yet support theprompt_prefixparameter inBundleConfig.You MUST construct the full prompt manually in your Kotlin/Java code. See the "Android Integration" section below.
Android Integration
1. Add dependency
// build.gradle (app level)
android {
aaptOptions { noCompress("task") }
defaultConfig { minSdk 24 }
}
dependencies {
implementation "com.google.mediapipe:tasks-genai:0.10.27"
// Use 0.10.27+ for best Gemma 3 support
}
2. Push model to device (ADB β for testing)
adb shell mkdir -p /data/local/tmp/llm/
adb push artha_functiongemma_v9_0_0.task /data/local/tmp/llm/
For production, use assets/ folder (see Step 3 below).
3. LlmInference setup (Kotlin)
import com.google.mediapipe.tasks.genai.llminference.LlmInference
import com.google.mediapipe.tasks.genai.llminference.LlmInferenceSession
class ArthaLlmManager(private val context: Context) {
private var llmInference: LlmInference? = null
private var session: LlmInferenceSession? = null
fun initialize() {
val modelPath = "/data/local/tmp/llm/artha_functiongemma_v9_0_0.task"
// OR from assets: context.getExternalFilesDir(null)?.absolutePath + "/model.task"
val options = LlmInference.LlmInferenceOptions.builder()
.setModelPath(modelPath)
.setMaxTokens(1024) // must match KV cache max len
.setTopK(64)
.setTopP(0.95f)
.setTemperature(1.0f) // FunctionGemma uses temp=1.0
.build()
llmInference = LlmInference.createFromOptions(context, options)
// Create a session for inference
val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder()
.setTopK(64)
.setTopP(0.95f)
.setTemperature(1.0f)
.build()
session = LlmInferenceSession.createFromLlmInference(llmInference!!, sessionOptions)
}
/**
* CRITICAL: Build the full FunctionGemma prompt manually.
* The .task bundle does NOT have an embedded prompt template.
*
* @param appName e.g. "WhatsApp"
* @param notificationText e.g. "Call from Mom"
* @param systemPrompt Your full system prompt with function declarations
*/
fun buildPrompt(
appName: String,
notificationText: String,
systemPrompt: String
): String = buildString {
// FunctionGemma uses a special chat format:
// developer role β functions declared here
// user role β actual notification
// model role β model completes here
append("<bos>")
append("<start_of_turn>developer\n")
append(systemPrompt) // Must include JSON function declarations
append("<end_of_turn>\n")
append("<start_of_turn>user\n")
append("$appName: $notificationText")
append("<end_of_turn>\n")
append("<start_of_turn>model\n") // Model generates from here
}
/**
* Synchronous inference β run on a background thread/coroutine.
*/
fun runInference(prompt: String): String {
return session?.generateResponse(prompt)
?: llmInference?.generateResponse(prompt)
?: ""
}
/**
* Streaming inference with a callback.
*/
fun runInferenceStreaming(
prompt: String,
onPartialResult: (String) -> Unit,
onComplete: () -> Unit
) {
session?.generateResponseAsync(prompt) { partial, done ->
onPartialResult(partial ?: "")
if (done) onComplete()
}
}
fun close() {
session?.close()
llmInference?.close()
}
}
4. Parse FunctionGemma output
FunctionGemma outputs in this format:
<start_function_call>call:function_name{param:<escape>value<escape>}<end_function_call>
object FunctionGemmaParser {
private val CALL_REGEX = Regex(
'''<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>''',
RegexOption.DOT_MATCHES_ALL
)
private val PARAM_REGEX = Regex('''(\w+):<escape>(.*?)<escape>''')
data class FunctionCall(val name: String, val params: Map<String, String>)
fun parse(output: String): List<FunctionCall> {
return CALL_REGEX.findAll(output).map { match ->
val functionName = match.groupValues[1]
val paramsStr = match.groupValues[2]
val params = PARAM_REGEX.findAll(paramsStr).associate {
it.groupValues[1] to it.groupValues[2]
}
FunctionCall(functionName, params)
}.toList()
}
}
5. System prompt example (notification triage)
const val SYSTEM_PROMPT = """You are Artha, a notification triage assistant running on-device.
You receive Android notification text and call the appropriate function.
Available functions:
[
{
"function": {
"name": "snooze_notification",
"description": "Snooze a notification for a given duration.",
"parameters": {
"type": "OBJECT",
"properties": {
"duration_minutes": {"type": "INTEGER", "description": "Minutes to snooze"},
"app": {"type": "STRING", "description": "App package name"}
},
"required": ["duration_minutes", "app"]
}
}
},
{
"function": {
"name": "mark_important",
"description": "Mark notification as important and show heads-up.",
"parameters": {
"type": "OBJECT",
"properties": {
"reason": {"type": "STRING", "description": "Why it is important"}
},
"required": ["reason"]
}
}
},
{
"function": {
"name": "dismiss_notification",
"description": "Silently dismiss the notification.",
"parameters": {"type": "OBJECT", "properties": {}}
}
}
]
"""
Known Issues & Gotchas
1. prompt_prefix missing from bundle
Problem: Built with mediapipe < 0.10.22 which didn't support prompt_prefix/prompt_suffix in BundleConfig. The .task has no embedded chat template.
Fix: Construct the full prompt in Kotlin (see buildPrompt() above). This is actually more flexible.
2. Fine-tuned model may lose call: prefix
Problem: A known issue (github.com/google-gemini/gemma-cookbook/issues/273) β when fine-tuned on data formatted with apply_chat_template, the model sometimes drops the call: prefix from output, producing <start_function_call>function_name{...} instead of <start_function_call>call:function_name{...}.
Fix: Update the parser regex to handle both formats:
private val CALL_REGEX = Regex(
'''<start_function_call>(?:call:)?(\w+)\{(.*?)\}<end_function_call>''',
RegexOption.DOT_MATCHES_ALL
)
3. Device requirements
MediaPipe LLM Inference API requires Android 7.0+ (SDK 24) and works best on devices with 6GB+ RAM (Pixel 8, Samsung S23+). The 270M model uses ~400-600 MB RAM at inference time.
4. GPU backend
By default LlmInference uses CPU (XNNPACK). For GPU acceleration add:
.setAcceleratorName("gpu")
But GPU may cause issues on some devices with int8 models β test before shipping.
5. Session vs. stateless API
Use LlmInferenceSession for multi-turn (maintains context), plain LlmInference.generateResponse() for single-shot notification triage (resets each time β fine for Artha's use case).
Source & Credits
- Base model: google/functiongemma-270m-it
- Training: Custom fine-tune on notification data
- Conversion: litert-torch v0.8+ β mediapipe bundler
- MediaPipe Android docs: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android
- Downloads last month
- 8
Model tree for 2796gauravc/artha-functiongemma-270m-mediapipe
Base model
google/functiongemma-270m-it