# MiniMind Android Deployment Guide Deploy MiniMind (Mind2) models on Android devices using multiple runtime options. ## Deployment Options | Runtime | Size | Speed | Ease of Use | |---------|------|-------|-------------| | **llama.cpp** | ★★★★★ | ★★★★☆ | ★★★★☆ | | **ONNX Runtime** | ★★★★☆ | ★★★☆☆ | ★★★★★ | | **MLC-LLM** | ★★★★☆ | ★★★★★ | ★★★☆☆ | | **TensorFlow Lite** | ★★★★★ | ★★★☆☆ | ★★★★☆ | ## Quick Start ### Option 1: llama.cpp (Recommended) ```bash # 1. Export model to GGUF format python scripts/export_gguf.py --model mind2-lite --output models/mind2-lite.gguf # 2. Build llama.cpp for Android git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build-android && cd build-android cmake .. -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 make -j # 3. Copy to Android project cp libllama.so ../android/app/src/main/jniLibs/arm64-v8a/ ``` ### Option 2: ONNX Runtime ```bash # 1. Export model to ONNX python scripts/export_onnx.py --model mind2-lite --output models/mind2-lite.onnx # 2. Add ONNX Runtime to Android project # In app/build.gradle: dependencies { implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0' } ``` ### Option 3: MLC-LLM ```bash # 1. Install MLC-LLM pip install mlc-llm # 2. Compile model for Android mlc_llm compile mind2-lite --target android # 3. Package for deployment mlc_llm package mind2-lite --target android --output ./android/app/src/main/assets/ ``` ## Project Structure ``` android/ ├── app/ │ ├── src/main/ │ │ ├── java/com/minimind/ │ │ │ ├── Mind2Model.java # Model wrapper │ │ │ ├── Mind2Tokenizer.java # Tokenizer │ │ │ └── Mind2Chat.java # Chat interface │ │ ├── jniLibs/ │ │ │ └── arm64-v8a/ │ │ │ └── libllama.so │ │ └── assets/ │ │ ├── mind2-lite.gguf │ │ └── tokenizer.json │ └── build.gradle ├── jni/ │ ├── mind2_jni.cpp # JNI bridge │ └── CMakeLists.txt └── README.md ``` ## Memory Requirements | Model | RAM (INT4) | RAM (FP16) | Storage | |-------|-----------|-----------|---------| | mind2-nano | ~400MB | ~800MB | ~300MB | | mind2-lite | ~1.2GB | ~2.4GB | ~900MB | | mind2-pro | ~2.4GB | ~4.8GB | ~1.8GB | ## Performance Benchmarks Tested on common Android devices: | Device | Model | Tokens/sec | |--------|-------|-----------| | Pixel 8 Pro | mind2-nano | 45 | | Pixel 8 Pro | mind2-lite | 22 | | Samsung S24 | mind2-nano | 52 | | Samsung S24 | mind2-lite | 28 | ## Best Practices 1. **Use INT4 quantization** for best size/performance balance 2. **Limit context length** to 512-1024 tokens on mobile 3. **Enable KV-cache** for faster generation 4. **Use streaming** for responsive UI 5. **Handle memory pressure** gracefully ## Troubleshooting ### Out of Memory - Use smaller model (nano instead of lite) - Reduce context length - Enable swap if available ### Slow Inference - Check CPU governor (set to performance) - Ensure using NEON/ARM optimizations - Consider GPU acceleration (MLC-LLM) ### Model Loading Failed - Verify GGUF file integrity - Check storage permissions - Ensure enough free space