File size: 7,999 Bytes
b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 29c5291 b352940 29c5291 b352940 29c5291 207ef19 29c5291 b352940 8d272fa b352940 29c5291 b352940 29c5291 b352940 29c5291 8d272fa b352940 29c5291 207ef19 29c5291 b352940 29c5291 b352940 8d272fa b352940 29c5291 b352940 8d272fa 29c5291 b352940 8d272fa b352940 8d272fa 29c5291 b352940 8d272fa b352940 8d272fa b352940 29c5291 b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 29c5291 b352940 29c5291 b352940 8d272fa 29c5291 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 29c5291 b352940 29c5291 8d272fa b352940 29c5291 b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 8d272fa b352940 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | ---
license: apache-2.0
language:
- en
- de
base_model: Qwen/Qwen3-4B
tags:
- mimi
- tool-calling
- function-calling
- agent
- gguf
- fine-tuned
- wllama
- browser-inference
- on-device-ai
- local-ai
- privacy-first
model-index:
- name: MIMI Pro
results:
- task:
type: function-calling
name: Tool Calling
dataset:
type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
name: BFCL V4
metrics:
- type: accuracy
value: 60.8
name: Simple Function Calling (Python)
verified: false
- type: accuracy
value: 57.5
name: Multiple Sequential Calls
verified: false
- type: accuracy
value: 90
name: Irrelevance Detection
verified: false
pipeline_tag: text-generation
---
# MIMI Pro
MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β designed to run entirely on-device, in the browser, with zero cloud dependencies.
Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com).
> **π¬ V1 β Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development.
## Performance
### BFCL V4 Benchmark (Partial β Single-Turn, 20 samples/category)
| Category | MIMI Pro V1 | Base Qwen3-4B | Notes |
|---|---|---|---|
| Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms |
| Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms |
| Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms |
| Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded |
| Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong |
| Live Simple | β | **90.0%** (20 tests) | Base only |
> β οΈ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.
### Training Metrics (Internal)
| Metric | Value |
|---|---|
| Training Token Accuracy | 97.66% |
| Eval Token Accuracy | 97.29% |
| Training Loss | 0.084 |
| Parameters | 4.02 Billion |
| Quantized Size | 2.3 GB (Q4_K_M) |
## Architecture
MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
**Key Design Decisions:**
- Custom tool-calling format optimized for the MIMI Agent browser environment
- 19 tool types covering web search, code execution, file operations, browser automation
- Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)
**Known Limitations of V1:**
- Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
- The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format
- V2 will address these issues with conservative fine-tuning and Qwen3-native format support
## Supported Tools
| Category | Tools |
|---|---|
| π Web | web_search, browse_url, browser_action |
| π» Code | execute_python, create_file, edit_file |
| π¬ Research | deep_research, generate_document |
| π System | read_file, list_directory, run_terminal |
| π§ Reasoning | Multi-step orchestration |
## Quick Start
### Browser (wllama/WebAssembly)
```javascript
import { Wllama } from '@anthropic-ai/wllama';
const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
{ n_ctx: 4096 }
);
const response = await wllama.createChatCompletion([
{ role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
{ role: 'user', content: 'Search for the latest AI news and summarize it' }
]);
```
### llama.cpp
```bash
./llama-cli -m mimi-qwen3-4b-q4km.gguf \
-p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
-n 512 --temp 0.6
```
### Python
```python
from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
{"role": "system", "content": "You are MIMI, an AI agent with tool access."},
{"role": "user", "content": "Search for the latest AI news"}
])
```
## Output Format
MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format):
```json
{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}
```
## The MIMI Model Family
| Model | Parameters | Size | Target Device | Status |
|---|---|---|---|---|
| MIMI Nano | 0.6B | ~400 MB | Any device, IoT | π Coming |
| MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | π Coming |
| **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **β
Available** |
| MIMI Max | 8B | ~4.5 GB | Workstations | π Coming |
All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
## Training Details
```yaml
method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
```
## Why MIMI?
- π **Privacy First** β Your data never leaves your device. Period.
- π° **Zero Cost** β No API keys, no subscriptions, no per-token billing.
- β‘ **Fast** β Runs at native speed via WebAssembly, no server round-trips.
- π **Works Offline** β Once downloaded, no internet required.
- π§ **Tool Native** β Purpose-built for autonomous tool calling.
## Limitations
- V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`)
- Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
- Context window: 4,096 tokens (training config). Base architecture supports 32K.
- Requires ~3 GB RAM for inference in browser.
- Q4_K_M quantization trades minimal quality for 3.5x size reduction.
## Roadmap
- [x] **V1** β Custom format, 19 tools, browser-optimized (current release)
- [ ] **V2** β Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning
- [ ] **Model Family** β Nano (0.6B), Small (1.7B), Max (8B) releases
- [ ] **Multi-Turn** β Agentic conversation chains with tool result feedback
## About Mimi Tech AI
[Mimi Tech AI](https://mimitechai.com) builds on-device AI β no cloud, no data leaks, full user control.
- π [mimitechai.com](https://mimitechai.com)
- π [GitHub](https://github.com/MimiTechAi)
- πΌ [LinkedIn](https://linkedin.com/company/mimitechai)
- π’ [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member
## License
Apache 2.0 β free for commercial and personal use.
## Citation
```bibtex
@misc{mimitechai2026mimi,
title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
author={Bemler, Michael and Soppa, Michael},
year={2026},
publisher={Mimi Tech AI},
url={https://huggingface.co/MimiTechAI/mimi-pro}
}
```
|