# MiniMind Android Deployment Guide

Deploy MiniMind (Mind2) models on Android devices using multiple runtime options.

## Deployment Options

| Runtime | Size | Speed | Ease of Use |
|---------|------|-------|-------------|
| **llama.cpp** | ★★★★★ | ★★★★☆ | ★★★★☆ |
| **ONNX Runtime** | ★★★★☆ | ★★★☆☆ | ★★★★★ |
| **MLC-LLM** | ★★★★☆ | ★★★★★ | ★★★☆☆ |
| **TensorFlow Lite** | ★★★★★ | ★★★☆☆ | ★★★★☆ |

## Quick Start

### Option 1: llama.cpp (Recommended)

```bash
# 1. Export model to GGUF format
python scripts/export_gguf.py --model mind2-lite --output models/mind2-lite.gguf

# 2. Build llama.cpp for Android
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build-android && cd build-android
cmake .. -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28
make -j

# 3. Copy to Android project
cp libllama.so ../android/app/src/main/jniLibs/arm64-v8a/
```

### Option 2: ONNX Runtime

```bash
# 1. Export model to ONNX
python scripts/export_onnx.py --model mind2-lite --output models/mind2-lite.onnx

# 2. Add ONNX Runtime to Android project
# In app/build.gradle:
dependencies {
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
}
```

### Option 3: MLC-LLM

```bash
# 1. Install MLC-LLM
pip install mlc-llm

# 2. Compile model for Android
mlc_llm compile mind2-lite --target android

# 3. Package for deployment
mlc_llm package mind2-lite --target android --output ./android/app/src/main/assets/
```

## Project Structure

```
android/
├── app/
│   ├── src/main/
│   │   ├── java/com/minimind/
│   │   │   ├── Mind2Model.java      # Model wrapper
│   │   │   ├── Mind2Tokenizer.java  # Tokenizer
│   │   │   └── Mind2Chat.java       # Chat interface
│   │   ├── jniLibs/
│   │   │   └── arm64-v8a/
│   │   │       └── libllama.so
│   │   └── assets/
│   │       ├── mind2-lite.gguf
│   │       └── tokenizer.json
│   └── build.gradle
├── jni/
│   ├── mind2_jni.cpp               # JNI bridge
│   └── CMakeLists.txt
└── README.md
```

## Memory Requirements

| Model | RAM (INT4) | RAM (FP16) | Storage |
|-------|-----------|-----------|---------|
| mind2-nano | ~400MB | ~800MB | ~300MB |
| mind2-lite | ~1.2GB | ~2.4GB | ~900MB |
| mind2-pro | ~2.4GB | ~4.8GB | ~1.8GB |

## Performance Benchmarks

Tested on common Android devices:

| Device | Model | Tokens/sec |
|--------|-------|-----------|
| Pixel 8 Pro | mind2-nano | 45 |
| Pixel 8 Pro | mind2-lite | 22 |
| Samsung S24 | mind2-nano | 52 |
| Samsung S24 | mind2-lite | 28 |

## Best Practices

1. **Use INT4 quantization** for best size/performance balance
2. **Limit context length** to 512-1024 tokens on mobile
3. **Enable KV-cache** for faster generation
4. **Use streaming** for responsive UI
5. **Handle memory pressure** gracefully

## Troubleshooting

### Out of Memory
- Use smaller model (nano instead of lite)
- Reduce context length
- Enable swap if available

### Slow Inference
- Check CPU governor (set to performance)
- Ensure using NEON/ARM optimizations
- Consider GPU acceleration (MLC-LLM)

### Model Loading Failed
- Verify GGUF file integrity
- Check storage permissions
- Ensure enough free space