File size: 3,478 Bytes
8b187bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# MiniMind Android Deployment Guide

Deploy MiniMind (Mind2) models on Android devices using multiple runtime options.

## Deployment Options

| Runtime | Size | Speed | Ease of Use |
|---------|------|-------|-------------|
| **llama.cpp** | β˜…β˜…β˜…β˜…β˜… | β˜…β˜…β˜…β˜…β˜† | β˜…β˜…β˜…β˜…β˜† |
| **ONNX Runtime** | β˜…β˜…β˜…β˜…β˜† | β˜…β˜…β˜…β˜†β˜† | β˜…β˜…β˜…β˜…β˜… |
| **MLC-LLM** | β˜…β˜…β˜…β˜…β˜† | β˜…β˜…β˜…β˜…β˜… | β˜…β˜…β˜…β˜†β˜† |
| **TensorFlow Lite** | β˜…β˜…β˜…β˜…β˜… | β˜…β˜…β˜…β˜†β˜† | β˜…β˜…β˜…β˜…β˜† |

## Quick Start

### Option 1: llama.cpp (Recommended)

```bash
# 1. Export model to GGUF format
python scripts/export_gguf.py --model mind2-lite --output models/mind2-lite.gguf

# 2. Build llama.cpp for Android
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build-android && cd build-android
cmake .. -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28
make -j

# 3. Copy to Android project
cp libllama.so ../android/app/src/main/jniLibs/arm64-v8a/
```

### Option 2: ONNX Runtime

```bash
# 1. Export model to ONNX
python scripts/export_onnx.py --model mind2-lite --output models/mind2-lite.onnx

# 2. Add ONNX Runtime to Android project
# In app/build.gradle:
dependencies {
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
}
```

### Option 3: MLC-LLM

```bash
# 1. Install MLC-LLM
pip install mlc-llm

# 2. Compile model for Android
mlc_llm compile mind2-lite --target android

# 3. Package for deployment
mlc_llm package mind2-lite --target android --output ./android/app/src/main/assets/
```

## Project Structure

```
android/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ src/main/
β”‚   β”‚   β”œβ”€β”€ java/com/minimind/
β”‚   β”‚   β”‚   β”œβ”€β”€ Mind2Model.java      # Model wrapper
β”‚   β”‚   β”‚   β”œβ”€β”€ Mind2Tokenizer.java  # Tokenizer
β”‚   β”‚   β”‚   └── Mind2Chat.java       # Chat interface
β”‚   β”‚   β”œβ”€β”€ jniLibs/
β”‚   β”‚   β”‚   └── arm64-v8a/
β”‚   β”‚   β”‚       └── libllama.so
β”‚   β”‚   └── assets/
β”‚   β”‚       β”œβ”€β”€ mind2-lite.gguf
β”‚   β”‚       └── tokenizer.json
β”‚   └── build.gradle
β”œβ”€β”€ jni/
β”‚   β”œβ”€β”€ mind2_jni.cpp               # JNI bridge
β”‚   └── CMakeLists.txt
└── README.md
```

## Memory Requirements

| Model | RAM (INT4) | RAM (FP16) | Storage |
|-------|-----------|-----------|---------|
| mind2-nano | ~400MB | ~800MB | ~300MB |
| mind2-lite | ~1.2GB | ~2.4GB | ~900MB |
| mind2-pro | ~2.4GB | ~4.8GB | ~1.8GB |

## Performance Benchmarks

Tested on common Android devices:

| Device | Model | Tokens/sec |
|--------|-------|-----------|
| Pixel 8 Pro | mind2-nano | 45 |
| Pixel 8 Pro | mind2-lite | 22 |
| Samsung S24 | mind2-nano | 52 |
| Samsung S24 | mind2-lite | 28 |

## Best Practices

1. **Use INT4 quantization** for best size/performance balance
2. **Limit context length** to 512-1024 tokens on mobile
3. **Enable KV-cache** for faster generation
4. **Use streaming** for responsive UI
5. **Handle memory pressure** gracefully

## Troubleshooting

### Out of Memory
- Use smaller model (nano instead of lite)
- Reduce context length
- Enable swap if available

### Slow Inference
- Check CPU governor (set to performance)
- Ensure using NEON/ARM optimizations
- Consider GPU acceleration (MLC-LLM)

### Model Loading Failed
- Verify GGUF file integrity
- Check storage permissions
- Ensure enough free space