README.md · MimiTechAI/mimi-pro at main

mimi-pro / README.md

MimiTechAI

Fix: Clarify sample sizes in BFCL table, consistent irrelevance score notation

207ef19 verified about 9 hours ago

preview code

raw

history blame contribute delete

8 kB

	---
	license: apache-2.0
	language:
	- en
	- de
	base_model: Qwen/Qwen3-4B
	tags:
	- mimi
	- tool-calling
	- function-calling
	- agent
	- gguf
	- fine-tuned
	- wllama
	- browser-inference
	- on-device-ai
	- local-ai
	- privacy-first
	model-index:
	- name: MIMI Pro
	results:
	- task:
	type: function-calling
	name: Tool Calling
	dataset:
	type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
	name: BFCL V4
	metrics:
	- type: accuracy
	value: 60.8
	name: Simple Function Calling (Python)
	verified: false
	- type: accuracy
	value: 57.5
	name: Multiple Sequential Calls
	verified: false
	- type: accuracy
	value: 90
	name: Irrelevance Detection
	verified: false
	pipeline_tag: text-generation
	---

	# MIMI Pro

	MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution — designed to run entirely on-device, in the browser, with zero cloud dependencies.

	Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com).

	> 🔬 V1 — Experimental Release. This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development.

	## Performance

	### BFCL V4 Benchmark (Partial — Single-Turn, 20 samples/category)

	\| Category \| MIMI Pro V1 \| Base Qwen3-4B \| Notes \|
	\|---\|---\|---\|---\|
	\| Simple Python \| 60.8% (400 tests) \| 80.0% (20 tests) \| Base outperforms \|
	\| Simple Java \| 21.0% (100 tests) \| 60.0% (20 tests) \| Base outperforms \|
	\| Multiple (Sequential) \| 57.5% (200 tests) \| 75.0% (20 tests) \| Base outperforms \|
	\| Parallel \| 2.0% (200 tests) \| 75.0% (20 tests) \| Fine-tune degraded \|
	\| Irrelevance \| 90% (20 tests) \| 100% (20 tests) \| Both strong \|
	\| Live Simple \| — \| 90.0% (20 tests) \| Base only \|

	> ⚠️ Important Context: The previously reported "97.7% accuracy" was a training validation metric (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.

	### Training Metrics (Internal)

	\| Metric \| Value \|
	\|---\|---\|
	\| Training Token Accuracy \| 97.66% \|
	\| Eval Token Accuracy \| 97.29% \|
	\| Training Loss \| 0.084 \|
	\| Parameters \| 4.02 Billion \|
	\| Quantized Size \| 2.3 GB (Q4_K_M) \|

	## Architecture

	MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.

	Key Design Decisions:
	- Custom tool-calling format optimized for the MIMI Agent browser environment
	- 19 tool types covering web search, code execution, file operations, browser automation
	- Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)

	Known Limitations of V1:
	- Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
	- The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format
	- V2 will address these issues with conservative fine-tuning and Qwen3-native format support

	## Supported Tools

	\| Category \| Tools \|
	\|---\|---\|
	\| 🌐 Web \| web_search, browse_url, browser_action \|
	\| 💻 Code \| execute_python, create_file, edit_file \|
	\| 🔬 Research \| deep_research, generate_document \|
	\| 📁 System \| read_file, list_directory, run_terminal \|
	\| 🧠 Reasoning \| Multi-step orchestration \|

	## Quick Start

	### Browser (wllama/WebAssembly)

	```javascript
	import { Wllama } from '@anthropic-ai/wllama';

	const wllama = new Wllama(wasmPaths);
	await wllama.loadModelFromUrl(
	'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
	{ n_ctx: 4096 }
	);

	const response = await wllama.createChatCompletion([
	{ role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
	{ role: 'user', content: 'Search for the latest AI news and summarize it' }
	]);
	```

	### llama.cpp

	```bash
	./llama-cli -m mimi-qwen3-4b-q4km.gguf \
	-p "<\|im_start\|>system\nYou are MIMI, an AI agent with tool access.<\|im_end\|>\n<\|im_start\|>user\nSearch for the latest AI news<\|im_end\|>\n<\|im_start\|>assistant\n" \
	-n 512 --temp 0.6
	```

	### Python

	```python
	from llama_cpp import Llama
	llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
	output = llm.create_chat_completion(messages=[
	{"role": "system", "content": "You are MIMI, an AI agent with tool access."},
	{"role": "user", "content": "Search for the latest AI news"}
	])
	```

	## Output Format

	MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format):

	```json
	{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}
	```

	## The MIMI Model Family

	\| Model \| Parameters \| Size \| Target Device \| Status \|
	\|---\|---\|---\|---\|---\|
	\| MIMI Nano \| 0.6B \| ~400 MB \| Any device, IoT \| 🔜 Coming \|
	\| MIMI Small \| 1.7B \| ~1.0 GB \| Mobile & tablets \| 🔜 Coming \|
	\| MIMI Pro \| 4.02B \| 2.3 GB \| Desktop & laptop \| ✅ Available \|
	\| MIMI Max \| 8B \| ~4.5 GB \| Workstations \| 🔜 Coming \|

	All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

	## Training Details

	```yaml
	method: LoRA (PEFT) via Unsloth
	base_model: Qwen/Qwen3-4B
	lora_rank: 64
	lora_alpha: 128
	lora_dropout: 0.05
	target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
	learning_rate: 2.0e-04
	epochs: 3
	effective_batch_size: 8
	max_seq_length: 2048
	optimizer: adamw_8bit
	precision: bf16
	gradient_checkpointing: true
	packing: true
	dataset: 1,610 curated tool-calling examples (178K tokens)
	hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
	```

	## Why MIMI?

	- 🔒 Privacy First — Your data never leaves your device. Period.
	- 💰 Zero Cost — No API keys, no subscriptions, no per-token billing.
	- ⚡ Fast — Runs at native speed via WebAssembly, no server round-trips.
	- 🌍 Works Offline — Once downloaded, no internet required.
	- 🔧 Tool Native — Purpose-built for autonomous tool calling.

	## Limitations

	- V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`)
	- Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
	- Context window: 4,096 tokens (training config). Base architecture supports 32K.
	- Requires ~3 GB RAM for inference in browser.
	- Q4_K_M quantization trades minimal quality for 3.5x size reduction.

	## Roadmap

	- [x] V1 — Custom format, 19 tools, browser-optimized (current release)
	- [ ] V2 — Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning
	- [ ] Model Family — Nano (0.6B), Small (1.7B), Max (8B) releases
	- [ ] Multi-Turn — Agentic conversation chains with tool result feedback

	## About Mimi Tech AI

	[Mimi Tech AI](https://mimitechai.com) builds on-device AI — no cloud, no data leaks, full user control.

	- 🌐 [mimitechai.com](https://mimitechai.com)
	- 🐙 [GitHub](https://github.com/MimiTechAi)
	- 💼 [LinkedIn](https://linkedin.com/company/mimitechai)
	- 🟢 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member

	## License

	Apache 2.0 — free for commercial and personal use.

	## Citation

	```bibtex
	@misc{mimitechai2026mimi,
	title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
	author={Bemler, Michael and Soppa, Michael},
	year={2026},
	publisher={Mimi Tech AI},
	url={https://huggingface.co/MimiTechAI/mimi-pro}
	}
	```