File size: 3,478 Bytes
8b187bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# MiniMind Android Deployment Guide
Deploy MiniMind (Mind2) models on Android devices using multiple runtime options.
## Deployment Options
| Runtime | Size | Speed | Ease of Use |
|---------|------|-------|-------------|
| **llama.cpp** | β
β
β
β
β
| β
β
β
β
β | β
β
β
β
β |
| **ONNX Runtime** | β
β
β
β
β | β
β
β
ββ | β
β
β
β
β
|
| **MLC-LLM** | β
β
β
β
β | β
β
β
β
β
| β
β
β
ββ |
| **TensorFlow Lite** | β
β
β
β
β
| β
β
β
ββ | β
β
β
β
β |
## Quick Start
### Option 1: llama.cpp (Recommended)
```bash
# 1. Export model to GGUF format
python scripts/export_gguf.py --model mind2-lite --output models/mind2-lite.gguf
# 2. Build llama.cpp for Android
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build-android && cd build-android
cmake .. -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28
make -j
# 3. Copy to Android project
cp libllama.so ../android/app/src/main/jniLibs/arm64-v8a/
```
### Option 2: ONNX Runtime
```bash
# 1. Export model to ONNX
python scripts/export_onnx.py --model mind2-lite --output models/mind2-lite.onnx
# 2. Add ONNX Runtime to Android project
# In app/build.gradle:
dependencies {
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
}
```
### Option 3: MLC-LLM
```bash
# 1. Install MLC-LLM
pip install mlc-llm
# 2. Compile model for Android
mlc_llm compile mind2-lite --target android
# 3. Package for deployment
mlc_llm package mind2-lite --target android --output ./android/app/src/main/assets/
```
## Project Structure
```
android/
βββ app/
β βββ src/main/
β β βββ java/com/minimind/
β β β βββ Mind2Model.java # Model wrapper
β β β βββ Mind2Tokenizer.java # Tokenizer
β β β βββ Mind2Chat.java # Chat interface
β β βββ jniLibs/
β β β βββ arm64-v8a/
β β β βββ libllama.so
β β βββ assets/
β β βββ mind2-lite.gguf
β β βββ tokenizer.json
β βββ build.gradle
βββ jni/
β βββ mind2_jni.cpp # JNI bridge
β βββ CMakeLists.txt
βββ README.md
```
## Memory Requirements
| Model | RAM (INT4) | RAM (FP16) | Storage |
|-------|-----------|-----------|---------|
| mind2-nano | ~400MB | ~800MB | ~300MB |
| mind2-lite | ~1.2GB | ~2.4GB | ~900MB |
| mind2-pro | ~2.4GB | ~4.8GB | ~1.8GB |
## Performance Benchmarks
Tested on common Android devices:
| Device | Model | Tokens/sec |
|--------|-------|-----------|
| Pixel 8 Pro | mind2-nano | 45 |
| Pixel 8 Pro | mind2-lite | 22 |
| Samsung S24 | mind2-nano | 52 |
| Samsung S24 | mind2-lite | 28 |
## Best Practices
1. **Use INT4 quantization** for best size/performance balance
2. **Limit context length** to 512-1024 tokens on mobile
3. **Enable KV-cache** for faster generation
4. **Use streaming** for responsive UI
5. **Handle memory pressure** gracefully
## Troubleshooting
### Out of Memory
- Use smaller model (nano instead of lite)
- Reduce context length
- Enable swap if available
### Slow Inference
- Check CPU governor (set to performance)
- Ensure using NEON/ARM optimizations
- Consider GPU acceleration (MLC-LLM)
### Model Loading Failed
- Verify GGUF file integrity
- Check storage permissions
- Ensure enough free space
|