FunctionGemma Mobile Models
Available Formats
0 mobile format(s) available:
Usage Examples
PyTorch Mobile (Android)
// Load the model
Module module = Module.load(assetFilePath(this, "functiongemma_mobile.pt"));
// Prepare input
long[] inputIds = new long[128];
// Fill with tokenized text
// Create tensor
Tensor inputTensor = Tensor.fromBlob(inputIds, new long[]{1, 128});
// Run inference
IValue output = module.forward(IValue.from(inputTensor));
Tensor outputTensor = output.toTensor();
PyTorch Mobile (iOS)
// Load model
guard let filePath = Bundle.main.path(forResource: "functiongemma_mobile", ofType: "pt") else {
return
}
let module = try TorchModule(fileAtPath: filePath)
// Prepare input
var inputIds: [Int64] = Array(repeating: 0, count: 128)
// Fill with tokenized text
// Create tensor
let inputTensor = try Tensor(shape: [1, 128], data: inputIds)
// Run inference
let outputTensor = try module.forward([inputTensor])
ONNX Runtime (Cross-platform)
import onnxruntime as ort
# Load model
session = ort.InferenceSession("functiongemma.onnx")
# Prepare input
input_ids = np.array([[...]], dtype=np.int64)
# Run inference
outputs = session.run(None, {{"input_ids": input_ids}})
logits = outputs[0]
Model Details
- Base Model: {mobile_info['base_model']}
- Vocab Size: {mobile_info['vocab_size']:,}
- Max Sequence: {mobile_info['max_seq_length']} tokens
- Recommended: {mobile_info['recommended_seq_length']} tokens (mobile)
- Fine-tuned on: {mobile_info['fine_tuned_on']}
Performance
- Inference Time: 50-300ms on mobile devices
- Memory Usage: 300-800 MB RAM
- Quantized Version: 2-4x faster, ~75% smaller
Requirements
PyTorch Mobile
- Android: Min SDK 21, PyTorch Mobile library
- iOS: Min iOS 12.0, LibTorch-Lite
ONNX Runtime
- ONNX Runtime Mobile
- Android/iOS/Web/Desktop support
Notes
- Use quantized version for better mobile performance
- Recommended sequence length: 128 tokens
- Batch size: 1 (mobile optimization)