| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - de |
| | base_model: Qwen/Qwen3-4B |
| | tags: |
| | - mimi |
| | - tool-calling |
| | - function-calling |
| | - agent |
| | - gguf |
| | - fine-tuned |
| | - wllama |
| | - browser-inference |
| | - on-device-ai |
| | - local-ai |
| | - privacy-first |
| | model-index: |
| | - name: MIMI Pro |
| | results: |
| | - task: |
| | type: function-calling |
| | name: Tool Calling |
| | dataset: |
| | type: gorilla-llm/Berkeley-Function-Calling-Leaderboard |
| | name: BFCL V4 |
| | metrics: |
| | - type: accuracy |
| | value: 60.8 |
| | name: Simple Function Calling (Python) |
| | verified: false |
| | - type: accuracy |
| | value: 57.5 |
| | name: Multiple Sequential Calls |
| | verified: false |
| | - type: accuracy |
| | value: 90 |
| | name: Irrelevance Detection |
| | verified: false |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # MIMI Pro |
| |
|
| | MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β designed to run entirely on-device, in the browser, with zero cloud dependencies. |
| |
|
| | Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com). |
| |
|
| | > **π¬ V1 β Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development. |
| |
|
| | ## Performance |
| |
|
| | ### BFCL V4 Benchmark (Partial β Single-Turn, 20 samples/category) |
| |
|
| | | Category | MIMI Pro V1 | Base Qwen3-4B | Notes | |
| | |---|---|---|---| |
| | | Simple Python | 60.8% (400 tests) | **80.0%** (20 tests) | Base outperforms | |
| | | Simple Java | 21.0% (100 tests) | **60.0%** (20 tests) | Base outperforms | |
| | | Multiple (Sequential) | 57.5% (200 tests) | **75.0%** (20 tests) | Base outperforms | |
| | | Parallel | 2.0% (200 tests) | **75.0%** (20 tests) | Fine-tune degraded | |
| | | Irrelevance | 90% (20 tests) | **100%** (20 tests) | Both strong | |
| | | Live Simple | β | **90.0%** (20 tests) | Base only | |
| |
|
| | > β οΈ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation. |
| |
|
| | ### Training Metrics (Internal) |
| |
|
| | | Metric | Value | |
| | |---|---| |
| | | Training Token Accuracy | 97.66% | |
| | | Eval Token Accuracy | 97.29% | |
| | | Training Loss | 0.084 | |
| | | Parameters | 4.02 Billion | |
| | | Quantized Size | 2.3 GB (Q4_K_M) | |
| |
|
| | ## Architecture |
| |
|
| | MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark. |
| |
|
| | **Key Design Decisions:** |
| | - Custom tool-calling format optimized for the MIMI Agent browser environment |
| | - 19 tool types covering web search, code execution, file operations, browser automation |
| | - Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory) |
| |
|
| | **Known Limitations of V1:** |
| | - Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling |
| | - The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format |
| | - V2 will address these issues with conservative fine-tuning and Qwen3-native format support |
| |
|
| | ## Supported Tools |
| |
|
| | | Category | Tools | |
| | |---|---| |
| | | π Web | web_search, browse_url, browser_action | |
| | | π» Code | execute_python, create_file, edit_file | |
| | | π¬ Research | deep_research, generate_document | |
| | | π System | read_file, list_directory, run_terminal | |
| | | π§ Reasoning | Multi-step orchestration | |
| | |
| | ## Quick Start |
| | |
| | ### Browser (wllama/WebAssembly) |
| | |
| | ```javascript |
| | import { Wllama } from '@anthropic-ai/wllama'; |
| | |
| | const wllama = new Wllama(wasmPaths); |
| | await wllama.loadModelFromUrl( |
| | 'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf', |
| | { n_ctx: 4096 } |
| | ); |
| |
|
| | const response = await wllama.createChatCompletion([ |
| | { role: 'system', content: 'You are MIMI, an AI agent with tool access.' }, |
| | { role: 'user', content: 'Search for the latest AI news and summarize it' } |
| | ]); |
| | ``` |
| | |
| | ### llama.cpp |
| | |
| | ```bash |
| | ./llama-cli -m mimi-qwen3-4b-q4km.gguf \ |
| | -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \ |
| | -n 512 --temp 0.6 |
| | ``` |
| | |
| | ### Python |
| | |
| | ```python |
| | from llama_cpp import Llama |
| | llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096) |
| | output = llm.create_chat_completion(messages=[ |
| | {"role": "system", "content": "You are MIMI, an AI agent with tool access."}, |
| | {"role": "user", "content": "Search for the latest AI news"} |
| | ]) |
| | ``` |
| | |
| | ## Output Format |
| |
|
| | MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format): |
| |
|
| | ```json |
| | {"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}} |
| | ``` |
| |
|
| | ## The MIMI Model Family |
| |
|
| | | Model | Parameters | Size | Target Device | Status | |
| | |---|---|---|---|---| |
| | | MIMI Nano | 0.6B | ~400 MB | Any device, IoT | π Coming | |
| | | MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | π Coming | |
| | | **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **β
Available** | |
| | | MIMI Max | 8B | ~4.5 GB | Workstations | π Coming | |
| |
|
| | All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly. |
| |
|
| | ## Training Details |
| |
|
| | ```yaml |
| | method: LoRA (PEFT) via Unsloth |
| | base_model: Qwen/Qwen3-4B |
| | lora_rank: 64 |
| | lora_alpha: 128 |
| | lora_dropout: 0.05 |
| | target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj] |
| | learning_rate: 2.0e-04 |
| | epochs: 3 |
| | effective_batch_size: 8 |
| | max_seq_length: 2048 |
| | optimizer: adamw_8bit |
| | precision: bf16 |
| | gradient_checkpointing: true |
| | packing: true |
| | dataset: 1,610 curated tool-calling examples (178K tokens) |
| | hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory) |
| | ``` |
| |
|
| | ## Why MIMI? |
| |
|
| | - π **Privacy First** β Your data never leaves your device. Period. |
| | - π° **Zero Cost** β No API keys, no subscriptions, no per-token billing. |
| | - β‘ **Fast** β Runs at native speed via WebAssembly, no server round-trips. |
| | - π **Works Offline** β Once downloaded, no internet required. |
| | - π§ **Tool Native** β Purpose-built for autonomous tool calling. |
| |
|
| | ## Limitations |
| |
|
| | - V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`) |
| | - Parallel tool calling (multiple simultaneous calls) is degraded vs. base model |
| | - Context window: 4,096 tokens (training config). Base architecture supports 32K. |
| | - Requires ~3 GB RAM for inference in browser. |
| | - Q4_K_M quantization trades minimal quality for 3.5x size reduction. |
| |
|
| | ## Roadmap |
| |
|
| | - [x] **V1** β Custom format, 19 tools, browser-optimized (current release) |
| | - [ ] **V2** β Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning |
| | - [ ] **Model Family** β Nano (0.6B), Small (1.7B), Max (8B) releases |
| | - [ ] **Multi-Turn** β Agentic conversation chains with tool result feedback |
| |
|
| | ## About Mimi Tech AI |
| |
|
| | [Mimi Tech AI](https://mimitechai.com) builds on-device AI β no cloud, no data leaks, full user control. |
| |
|
| | - π [mimitechai.com](https://mimitechai.com) |
| | - π [GitHub](https://github.com/MimiTechAi) |
| | - πΌ [LinkedIn](https://linkedin.com/company/mimitechai) |
| | - π’ [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member |
| |
|
| | ## License |
| |
|
| | Apache 2.0 β free for commercial and personal use. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{mimitechai2026mimi, |
| | title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling}, |
| | author={Bemler, Michael and Soppa, Michael}, |
| | year={2026}, |
| | publisher={Mimi Tech AI}, |
| | url={https://huggingface.co/MimiTechAI/mimi-pro} |
| | } |
| | ``` |
| |
|