FunctionGemma Mobile Models

Available Formats

0 mobile format(s) available:

Usage Examples

PyTorch Mobile (Android)

// Load the model
Module module = Module.load(assetFilePath(this, "functiongemma_mobile.pt"));

// Prepare input
long[] inputIds = new long[128];
// Fill with tokenized text

// Create tensor
Tensor inputTensor = Tensor.fromBlob(inputIds, new long[]{1, 128});

// Run inference
IValue output = module.forward(IValue.from(inputTensor));
Tensor outputTensor = output.toTensor();

PyTorch Mobile (iOS)

// Load model
guard let filePath = Bundle.main.path(forResource: "functiongemma_mobile", ofType: "pt") else {
    return
}

let module = try TorchModule(fileAtPath: filePath)

// Prepare input
var inputIds: [Int64] = Array(repeating: 0, count: 128)
// Fill with tokenized text

// Create tensor
let inputTensor = try Tensor(shape: [1, 128], data: inputIds)

// Run inference
let outputTensor = try module.forward([inputTensor])

ONNX Runtime (Cross-platform)

import onnxruntime as ort

# Load model
session = ort.InferenceSession("functiongemma.onnx")

# Prepare input
input_ids = np.array([[...]], dtype=np.int64)

# Run inference
outputs = session.run(None, {{"input_ids": input_ids}})
logits = outputs[0]

Model Details

Base Model: {mobile_info['base_model']}
Vocab Size: {mobile_info['vocab_size']:,}
Max Sequence: {mobile_info['max_seq_length']} tokens
Recommended: {mobile_info['recommended_seq_length']} tokens (mobile)
Fine-tuned on: {mobile_info['fine_tuned_on']}

Performance

Inference Time: 50-300ms on mobile devices
Memory Usage: 300-800 MB RAM
Quantized Version: 2-4x faster, ~75% smaller

Requirements

PyTorch Mobile

Android: Min SDK 21, PyTorch Mobile library
iOS: Min iOS 12.0, LibTorch-Lite

ONNX Runtime

ONNX Runtime Mobile
Android/iOS/Web/Desktop support

Notes

Use quantized version for better mobile performance
Recommended sequence length: 128 tokens
Batch size: 1 (mobile optimization)