mimi-pro / README.md
MimiTechAI's picture
Fix: Clarify sample sizes in BFCL table, consistent irrelevance score notation
207ef19 verified
---
license: apache-2.0
language:
- en
- de
base_model: Qwen/Qwen3-4B
tags:
- mimi
- tool-calling
- function-calling
- agent
- gguf
- fine-tuned
- wllama
- browser-inference
- on-device-ai
- local-ai
- privacy-first
model-index:
- name: MIMI Pro
results:
- task:
type: function-calling
name: Tool Calling
dataset:
type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
name: BFCL V4
metrics:
- type: accuracy
value: 60.8
name: Simple Function Calling (Python)
verified: false
- type: accuracy
value: 57.5
name: Multiple Sequential Calls
verified: false
- type: accuracy
value: 90
name: Irrelevance Detection
verified: false
pipeline_tag: text-generation
---
# MIMI Pro
MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.
Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com).
> **πŸ”¬ V1 β€” Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development.
## Performance
### BFCL V4 Benchmark (Partial β€” Single-Turn, 20 samples/category)
| Category | MIMI Pro V1 | Base Qwen3-4B | Notes |
|---|---|---|---|
| Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms |
| Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms |
| Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms |
| Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded |
| Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong |
| Live Simple | β€” | **90.0%** (20 tests) | Base only |
> ⚠️ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.
### Training Metrics (Internal)
| Metric | Value |
|---|---|
| Training Token Accuracy | 97.66% |
| Eval Token Accuracy | 97.29% |
| Training Loss | 0.084 |
| Parameters | 4.02 Billion |
| Quantized Size | 2.3 GB (Q4_K_M) |
## Architecture
MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
**Key Design Decisions:**
- Custom tool-calling format optimized for the MIMI Agent browser environment
- 19 tool types covering web search, code execution, file operations, browser automation
- Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)
**Known Limitations of V1:**
- Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
- The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format
- V2 will address these issues with conservative fine-tuning and Qwen3-native format support
## Supported Tools
| Category | Tools |
|---|---|
| 🌐 Web | web_search, browse_url, browser_action |
| πŸ’» Code | execute_python, create_file, edit_file |
| πŸ”¬ Research | deep_research, generate_document |
| πŸ“ System | read_file, list_directory, run_terminal |
| 🧠 Reasoning | Multi-step orchestration |
## Quick Start
### Browser (wllama/WebAssembly)
```javascript
import { Wllama } from '@anthropic-ai/wllama';
const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
{ n_ctx: 4096 }
);
const response = await wllama.createChatCompletion([
{ role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
{ role: 'user', content: 'Search for the latest AI news and summarize it' }
]);
```
### llama.cpp
```bash
./llama-cli -m mimi-qwen3-4b-q4km.gguf \
-p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
-n 512 --temp 0.6
```
### Python
```python
from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
{"role": "system", "content": "You are MIMI, an AI agent with tool access."},
{"role": "user", "content": "Search for the latest AI news"}
])
```
## Output Format
MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format):
```json
{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}
```
## The MIMI Model Family
| Model | Parameters | Size | Target Device | Status |
|---|---|---|---|---|
| MIMI Nano | 0.6B | ~400 MB | Any device, IoT | πŸ”œ Coming |
| MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
| **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **βœ… Available** |
| MIMI Max | 8B | ~4.5 GB | Workstations | πŸ”œ Coming |
All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
## Training Details
```yaml
method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
```
## Why MIMI?
- πŸ”’ **Privacy First** β€” Your data never leaves your device. Period.
- πŸ’° **Zero Cost** β€” No API keys, no subscriptions, no per-token billing.
- ⚑ **Fast** β€” Runs at native speed via WebAssembly, no server round-trips.
- 🌍 **Works Offline** β€” Once downloaded, no internet required.
- πŸ”§ **Tool Native** β€” Purpose-built for autonomous tool calling.
## Limitations
- V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`)
- Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
- Context window: 4,096 tokens (training config). Base architecture supports 32K.
- Requires ~3 GB RAM for inference in browser.
- Q4_K_M quantization trades minimal quality for 3.5x size reduction.
## Roadmap
- [x] **V1** β€” Custom format, 19 tools, browser-optimized (current release)
- [ ] **V2** β€” Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning
- [ ] **Model Family** β€” Nano (0.6B), Small (1.7B), Max (8B) releases
- [ ] **Multi-Turn** β€” Agentic conversation chains with tool result feedback
## About Mimi Tech AI
[Mimi Tech AI](https://mimitechai.com) builds on-device AI β€” no cloud, no data leaks, full user control.
- 🌐 [mimitechai.com](https://mimitechai.com)
- πŸ™ [GitHub](https://github.com/MimiTechAi)
- πŸ’Ό [LinkedIn](https://linkedin.com/company/mimitechai)
- 🟒 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member
## License
Apache 2.0 β€” free for commercial and personal use.
## Citation
```bibtex
@misc{mimitechai2026mimi,
title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
author={Bemler, Michael and Soppa, Michael},
year={2026},
publisher={Mimi Tech AI},
url={https://huggingface.co/MimiTechAI/mimi-pro}
}
```