license: apache-2.0
language:
- en
- de
base_model: Qwen/Qwen3-4B
tags:
- mimi
- tool-calling
- function-calling
- agent
- gguf
- fine-tuned
- wllama
- browser-inference
- on-device-ai
- local-ai
- privacy-first
model-index:
- name: MIMI Pro
results:
- task:
type: text-generation
name: Tool/Function Calling
metrics:
- type: accuracy
value: 97.66
name: Token Accuracy
- type: accuracy
value: 97.29
name: Eval Accuracy
- type: loss
value: 0.084
name: Training Loss
library_name: transformers
pipeline_tag: text-generation
MIMI Pro
MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β designed to run entirely on-device, in the browser, with zero cloud dependencies.
Part of the MIMI Model Family by Mimi Tech AI.
π‘ MIMI Pro achieves 97.7% tool-calling accuracy while running completely locally. Your data never leaves your device.
Performance
| Metric | Value |
|---|---|
| Token Accuracy | 97.66% |
| Eval Accuracy | 97.29% |
| Training Loss | 0.084 |
| Parameters | 4.02 Billion |
| Quantized Size | 2.3 GB (Q4_K_M) |
| Training Time | 46 minutes |
| Training Hardware | NVIDIA DGX Spark (Grace Blackwell) |
Architecture
MIMI Pro is built on the Qwen3-4B architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.
Key Design Decisions:
- ChatML format with
<think>reasoning blocks for chain-of-thought - 19 tool types covering web search, code execution, file operations, browser automation, and deep research
- Multi-step chains β the model plans and executes sequences of tools autonomously
- Error recovery β trained on failure cases to self-correct
Supported Tools
| Category | Tools |
|---|---|
| π Web | web_search, browse_url, browser_action |
| π» Code | execute_python, create_file, edit_file |
| π¬ Research | deep_research, generate_document |
| π System | read_file, list_directory, run_terminal |
| π§ Reasoning | Multi-step orchestration, error recovery |
Quick Start
Browser (wllama/WebAssembly)
import { Wllama } from '@anthropic-ai/wllama';
const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
{ n_ctx: 4096 }
);
const response = await wllama.createChatCompletion([
{ role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
{ role: 'user', content: 'Search for the latest AI news and summarize it' }
]);
llama.cpp
./llama-cli -m mimi-qwen3-4b-q4km.gguf \
-p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
-n 512 --temp 0.6
Python
from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
{"role": "system", "content": "You are MIMI, an AI agent with tool access."},
{"role": "user", "content": "Search for the latest AI news"}
])
Output Format
MIMI Pro generates structured tool calls:
<tool_call>
{"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
</tool_call>
Multi-tool chains for complex tasks:
<tool_call>
{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
</tool_call>
<tool_call>
{"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
</tool_call>
The MIMI Model Family
| Model | Parameters | Size | Target Device | Status |
|---|---|---|---|---|
| MIMI Nano | 0.6B | ~400 MB | Any device, IoT | π Coming |
| MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | π Coming |
| MIMI Pro | 4.02B | 2.3 GB | Desktop & laptop | β Available |
| MIMI Max | 8B | ~4.5 GB | Workstations | π Coming |
All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
Training Details
method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
Why MIMI?
- π Privacy First β Your data never leaves your device. Period.
- π° Zero Cost β No API keys, no subscriptions, no per-token billing.
- β‘ Fast β Runs at native speed via WebAssembly, no server round-trips.
- π Works Offline β Once downloaded, no internet required.
- π§ Tool Native β Purpose-built for autonomous tool calling, not retrofitted.
Limitations
- Optimized for tool calling β for general chat, use the base model directly.
- Context window: 4,096 tokens (training config). Base architecture supports 32K.
- Requires ~3 GB RAM for inference in browser.
- Q4_K_M quantization trades minimal quality for 3.5x size reduction.
About Mimi Tech AI
Mimi Tech AI builds on-device AI β no cloud, no data leaks, full user control.
- π mimitechai.com
- π GitHub
- πΌ LinkedIn
- π’ NVIDIA Connect Program Member
License
Apache 2.0 β free for commercial and personal use.
Citation
@misc{mimitechai2026mimi,
title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
author={Bemler, Michael and Soppa, Michael},
year={2026},
publisher={Mimi Tech AI},
url={https://huggingface.co/MimiTechAI/mimi-pro}
}