--- license: apache-2.0 language: - en - de base_model: Qwen/Qwen3-4B tags: - mimi - tool-calling - function-calling - agent - gguf - fine-tuned - wllama - browser-inference - on-device-ai - local-ai - privacy-first model-index: - name: MIMI Pro results: - task: type: function-calling name: Tool Calling dataset: type: gorilla-llm/Berkeley-Function-Calling-Leaderboard name: BFCL V4 metrics: - type: accuracy value: 60.8 name: Simple Function Calling (Python) verified: false - type: accuracy value: 57.5 name: Multiple Sequential Calls verified: false - type: accuracy value: 90 name: Irrelevance Detection verified: false pipeline_tag: text-generation --- # MIMI Pro MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution — designed to run entirely on-device, in the browser, with zero cloud dependencies. Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com). > **🔬 V1 — Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `` prompting. V2 with official BFCL scores and Qwen3-native format support is in development. ## Performance ### BFCL V4 Benchmark (Partial — Single-Turn, 20 samples/category) | Category | MIMI Pro V1 | Base Qwen3-4B | Notes | |---|---|---|---| | Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms | | Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms | | Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms | | Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded | | Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong | | Live Simple | — | **90.0%** (20 tests) | Base only | > ⚠️ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation. ### Training Metrics (Internal) | Metric | Value | |---|---| | Training Token Accuracy | 97.66% | | Eval Token Accuracy | 97.29% | | Training Loss | 0.084 | | Parameters | 4.02 Billion | | Quantized Size | 2.3 GB (Q4_K_M) | ## Architecture MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark. **Key Design Decisions:** - Custom tool-calling format optimized for the MIMI Agent browser environment - 19 tool types covering web search, code execution, file operations, browser automation - Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory) **Known Limitations of V1:** - Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling - The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `` format - V2 will address these issues with conservative fine-tuning and Qwen3-native format support ## Supported Tools | Category | Tools | |---|---| | 🌐 Web | web_search, browse_url, browser_action | | 💻 Code | execute_python, create_file, edit_file | | 🔬 Research | deep_research, generate_document | | 📁 System | read_file, list_directory, run_terminal | | 🧠 Reasoning | Multi-step orchestration | ## Quick Start ### Browser (wllama/WebAssembly) ```javascript import { Wllama } from '@anthropic-ai/wllama'; const wllama = new Wllama(wasmPaths); await wllama.loadModelFromUrl( 'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf', { n_ctx: 4096 } ); const response = await wllama.createChatCompletion([ { role: 'system', content: 'You are MIMI, an AI agent with tool access.' }, { role: 'user', content: 'Search for the latest AI news and summarize it' } ]); ``` ### llama.cpp ```bash ./llama-cli -m mimi-qwen3-4b-q4km.gguf \ -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \ -n 512 --temp 0.6 ``` ### Python ```python from llama_cpp import Llama llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096) output = llm.create_chat_completion(messages=[ {"role": "system", "content": "You are MIMI, an AI agent with tool access."}, {"role": "user", "content": "Search for the latest AI news"} ]) ``` ## Output Format MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `` format): ```json {"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}} ``` ## The MIMI Model Family | Model | Parameters | Size | Target Device | Status | |---|---|---|---|---| | MIMI Nano | 0.6B | ~400 MB | Any device, IoT | 🔜 Coming | | MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | 🔜 Coming | | **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **✅ Available** | | MIMI Max | 8B | ~4.5 GB | Workstations | 🔜 Coming | All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly. ## Training Details ```yaml method: LoRA (PEFT) via Unsloth base_model: Qwen/Qwen3-4B lora_rank: 64 lora_alpha: 128 lora_dropout: 0.05 target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj] learning_rate: 2.0e-04 epochs: 3 effective_batch_size: 8 max_seq_length: 2048 optimizer: adamw_8bit precision: bf16 gradient_checkpointing: true packing: true dataset: 1,610 curated tool-calling examples (178K tokens) hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory) ``` ## Why MIMI? - 🔒 **Privacy First** — Your data never leaves your device. Period. - 💰 **Zero Cost** — No API keys, no subscriptions, no per-token billing. - ⚡ **Fast** — Runs at native speed via WebAssembly, no server round-trips. - 🌍 **Works Offline** — Once downloaded, no internet required. - 🔧 **Tool Native** — Purpose-built for autonomous tool calling. ## Limitations - V1 uses a custom tool-calling format (not Qwen3-native ``) - Parallel tool calling (multiple simultaneous calls) is degraded vs. base model - Context window: 4,096 tokens (training config). Base architecture supports 32K. - Requires ~3 GB RAM for inference in browser. - Q4_K_M quantization trades minimal quality for 3.5x size reduction. ## Roadmap - [x] **V1** — Custom format, 19 tools, browser-optimized (current release) - [ ] **V2** — Qwen3-native `` format, official BFCL V4 scores, conservative fine-tuning - [ ] **Model Family** — Nano (0.6B), Small (1.7B), Max (8B) releases - [ ] **Multi-Turn** — Agentic conversation chains with tool result feedback ## About Mimi Tech AI [Mimi Tech AI](https://mimitechai.com) builds on-device AI — no cloud, no data leaks, full user control. - 🌐 [mimitechai.com](https://mimitechai.com) - 🐙 [GitHub](https://github.com/MimiTechAi) - 💼 [LinkedIn](https://linkedin.com/company/mimitechai) - 🟢 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member ## License Apache 2.0 — free for commercial and personal use. ## Citation ```bibtex @misc{mimitechai2026mimi, title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling}, author={Bemler, Michael and Soppa, Michael}, year={2026}, publisher={Mimi Tech AI}, url={https://huggingface.co/MimiTechAI/mimi-pro} } ```