| # MiniMind Android Deployment Guide | |
| Deploy MiniMind (Mind2) models on Android devices using multiple runtime options. | |
| ## Deployment Options | |
| | Runtime | Size | Speed | Ease of Use | | |
| |---------|------|-------|-------------| | |
| | **llama.cpp** | β β β β β | β β β β β | β β β β β | | |
| | **ONNX Runtime** | β β β β β | β β β ββ | β β β β β | | |
| | **MLC-LLM** | β β β β β | β β β β β | β β β ββ | | |
| | **TensorFlow Lite** | β β β β β | β β β ββ | β β β β β | | |
| ## Quick Start | |
| ### Option 1: llama.cpp (Recommended) | |
| ```bash | |
| # 1. Export model to GGUF format | |
| python scripts/export_gguf.py --model mind2-lite --output models/mind2-lite.gguf | |
| # 2. Build llama.cpp for Android | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp | |
| mkdir build-android && cd build-android | |
| cmake .. -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \ | |
| -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 | |
| make -j | |
| # 3. Copy to Android project | |
| cp libllama.so ../android/app/src/main/jniLibs/arm64-v8a/ | |
| ``` | |
| ### Option 2: ONNX Runtime | |
| ```bash | |
| # 1. Export model to ONNX | |
| python scripts/export_onnx.py --model mind2-lite --output models/mind2-lite.onnx | |
| # 2. Add ONNX Runtime to Android project | |
| # In app/build.gradle: | |
| dependencies { | |
| implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0' | |
| } | |
| ``` | |
| ### Option 3: MLC-LLM | |
| ```bash | |
| # 1. Install MLC-LLM | |
| pip install mlc-llm | |
| # 2. Compile model for Android | |
| mlc_llm compile mind2-lite --target android | |
| # 3. Package for deployment | |
| mlc_llm package mind2-lite --target android --output ./android/app/src/main/assets/ | |
| ``` | |
| ## Project Structure | |
| ``` | |
| android/ | |
| βββ app/ | |
| β βββ src/main/ | |
| β β βββ java/com/minimind/ | |
| β β β βββ Mind2Model.java # Model wrapper | |
| β β β βββ Mind2Tokenizer.java # Tokenizer | |
| β β β βββ Mind2Chat.java # Chat interface | |
| β β βββ jniLibs/ | |
| β β β βββ arm64-v8a/ | |
| β β β βββ libllama.so | |
| β β βββ assets/ | |
| β β βββ mind2-lite.gguf | |
| β β βββ tokenizer.json | |
| β βββ build.gradle | |
| βββ jni/ | |
| β βββ mind2_jni.cpp # JNI bridge | |
| β βββ CMakeLists.txt | |
| βββ README.md | |
| ``` | |
| ## Memory Requirements | |
| | Model | RAM (INT4) | RAM (FP16) | Storage | | |
| |-------|-----------|-----------|---------| | |
| | mind2-nano | ~400MB | ~800MB | ~300MB | | |
| | mind2-lite | ~1.2GB | ~2.4GB | ~900MB | | |
| | mind2-pro | ~2.4GB | ~4.8GB | ~1.8GB | | |
| ## Performance Benchmarks | |
| Tested on common Android devices: | |
| | Device | Model | Tokens/sec | | |
| |--------|-------|-----------| | |
| | Pixel 8 Pro | mind2-nano | 45 | | |
| | Pixel 8 Pro | mind2-lite | 22 | | |
| | Samsung S24 | mind2-nano | 52 | | |
| | Samsung S24 | mind2-lite | 28 | | |
| ## Best Practices | |
| 1. **Use INT4 quantization** for best size/performance balance | |
| 2. **Limit context length** to 512-1024 tokens on mobile | |
| 3. **Enable KV-cache** for faster generation | |
| 4. **Use streaming** for responsive UI | |
| 5. **Handle memory pressure** gracefully | |
| ## Troubleshooting | |
| ### Out of Memory | |
| - Use smaller model (nano instead of lite) | |
| - Reduce context length | |
| - Enable swap if available | |
| ### Slow Inference | |
| - Check CPU governor (set to performance) | |
| - Ensure using NEON/ARM optimizations | |
| - Consider GPU acceleration (MLC-LLM) | |
| ### Model Loading Failed | |
| - Verify GGUF file integrity | |
| - Check storage permissions | |
| - Ensure enough free space | |