File size: 7,999 Bytes
b352940
 
 
 
 
 
 
8d272fa
b352940
 
 
 
 
 
 
 
8d272fa
 
b352940
8d272fa
b352940
 
29c5291
 
 
 
 
b352940
 
29c5291
 
 
b352940
29c5291
 
 
 
207ef19
29c5291
 
b352940
 
 
8d272fa
b352940
29c5291
b352940
29c5291
b352940
29c5291
8d272fa
 
b352940
29c5291
 
 
 
207ef19
 
 
 
 
 
29c5291
 
 
 
 
b352940
29c5291
 
 
 
 
 
b352940
8d272fa
b352940
29c5291
b352940
8d272fa
29c5291
 
 
 
 
 
 
 
b352940
8d272fa
b352940
8d272fa
29c5291
 
 
 
 
 
b352940
8d272fa
b352940
8d272fa
b352940
29c5291
b352940
 
8d272fa
b352940
8d272fa
 
b352940
 
 
 
8d272fa
b352940
 
 
8d272fa
b352940
 
 
 
8d272fa
b352940
 
8d272fa
b352940
 
 
 
 
 
 
 
 
 
8d272fa
b352940
29c5291
b352940
29c5291
 
b352940
 
8d272fa
 
 
29c5291
 
 
 
 
8d272fa
 
 
 
b352940
 
8d272fa
b352940
 
 
 
8d272fa
b352940
 
 
 
 
8d272fa
b352940
 
8d272fa
 
b352940
 
8d272fa
b352940
29c5291
 
 
 
 
b352940
 
 
29c5291
 
8d272fa
 
 
b352940
29c5291
 
 
 
 
 
 
b352940
 
8d272fa
b352940
8d272fa
b352940
 
8d272fa
b352940
 
 
8d272fa
b352940
 
 
 
 
8d272fa
b352940
 
 
8d272fa
b352940
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
license: apache-2.0
language:
  - en
  - de
base_model: Qwen/Qwen3-4B
tags:
  - mimi
  - tool-calling
  - function-calling
  - agent
  - gguf
  - fine-tuned
  - wllama
  - browser-inference
  - on-device-ai
  - local-ai
  - privacy-first
model-index:
  - name: MIMI Pro
    results:
      - task:
          type: function-calling
          name: Tool Calling
        dataset:
          type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
          name: BFCL V4
        metrics:
          - type: accuracy
            value: 60.8
            name: Simple Function Calling (Python)
            verified: false
          - type: accuracy
            value: 57.5
            name: Multiple Sequential Calls
            verified: false
          - type: accuracy
            value: 90
            name: Irrelevance Detection
            verified: false
pipeline_tag: text-generation
---

# MIMI Pro

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com).

> **πŸ”¬ V1 β€” Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development.

## Performance

### BFCL V4 Benchmark (Partial β€” Single-Turn, 20 samples/category)

| Category | MIMI Pro V1 | Base Qwen3-4B | Notes |
|---|---|---|---|
| Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms |
| Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms |
| Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms |
| Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded |
| Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong |
| Live Simple | β€” | **90.0%** (20 tests) | Base only |

> ⚠️ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.

### Training Metrics (Internal)

| Metric | Value |
|---|---|
| Training Token Accuracy | 97.66% |
| Eval Token Accuracy | 97.29% |
| Training Loss | 0.084 |
| Parameters | 4.02 Billion |
| Quantized Size | 2.3 GB (Q4_K_M) |

## Architecture

MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.

**Key Design Decisions:**
- Custom tool-calling format optimized for the MIMI Agent browser environment
- 19 tool types covering web search, code execution, file operations, browser automation
- Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)

**Known Limitations of V1:**
- Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
- The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format
- V2 will address these issues with conservative fine-tuning and Qwen3-native format support

## Supported Tools

| Category | Tools |
|---|---|
| 🌐 Web | web_search, browse_url, browser_action |
| πŸ’» Code | execute_python, create_file, edit_file |
| πŸ”¬ Research | deep_research, generate_document |
| πŸ“ System | read_file, list_directory, run_terminal |
| 🧠 Reasoning | Multi-step orchestration |

## Quick Start

### Browser (wllama/WebAssembly)

```javascript
import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);
```

### llama.cpp

```bash
./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6
```

### Python

```python
from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])
```

## Output Format

MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format):

```json
{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}
```

## The MIMI Model Family

| Model | Parameters | Size | Target Device | Status |
|---|---|---|---|---|
| MIMI Nano | 0.6B | ~400 MB | Any device, IoT | πŸ”œ Coming |
| MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
| **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **βœ… Available** |
| MIMI Max | 8B | ~4.5 GB | Workstations | πŸ”œ Coming |

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

## Training Details

```yaml
method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
```

## Why MIMI?

- πŸ”’ **Privacy First** β€” Your data never leaves your device. Period.
- πŸ’° **Zero Cost** β€” No API keys, no subscriptions, no per-token billing.
- ⚑ **Fast** β€” Runs at native speed via WebAssembly, no server round-trips.
- 🌍 **Works Offline** β€” Once downloaded, no internet required.
- πŸ”§ **Tool Native** β€” Purpose-built for autonomous tool calling.

## Limitations

- V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`)
- Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
- Context window: 4,096 tokens (training config). Base architecture supports 32K.
- Requires ~3 GB RAM for inference in browser.
- Q4_K_M quantization trades minimal quality for 3.5x size reduction.

## Roadmap

- [x] **V1** β€” Custom format, 19 tools, browser-optimized (current release)
- [ ] **V2** β€” Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning
- [ ] **Model Family** β€” Nano (0.6B), Small (1.7B), Max (8B) releases
- [ ] **Multi-Turn** β€” Agentic conversation chains with tool result feedback

## About Mimi Tech AI

[Mimi Tech AI](https://mimitechai.com) builds on-device AI β€” no cloud, no data leaks, full user control.

- 🌐 [mimitechai.com](https://mimitechai.com)
- πŸ™ [GitHub](https://github.com/MimiTechAi)
- πŸ’Ό [LinkedIn](https://linkedin.com/company/mimitechai)
- 🟒 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member

## License

Apache 2.0 β€” free for commercial and personal use.

## Citation

```bibtex
@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}
```