MIMI Pro

MIMI Pro Accuracy Size Browser Zero Cloud

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution โ€” designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by Mimi Tech AI.

๐Ÿ’ก MIMI Pro achieves 97.7% tool-calling accuracy while running completely locally. Your data never leaves your device.

Performance

Metric Value
Token Accuracy 97.66%
Eval Accuracy 97.29%
Training Loss 0.084
Parameters 4.02 Billion
Quantized Size 2.3 GB (Q4_K_M)
Training Time 46 minutes
Training Hardware NVIDIA DGX Spark (Grace Blackwell)

Architecture

MIMI Pro is built on the Qwen3-4B architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.

Key Design Decisions:

  • ChatML format with <think> reasoning blocks for chain-of-thought
  • 19 tool types covering web search, code execution, file operations, browser automation, and deep research
  • Multi-step chains โ€” the model plans and executes sequences of tools autonomously
  • Error recovery โ€” trained on failure cases to self-correct

Supported Tools

Category Tools
๐ŸŒ Web web_search, browse_url, browser_action
๐Ÿ’ป Code execute_python, create_file, edit_file
๐Ÿ”ฌ Research deep_research, generate_document
๐Ÿ“ System read_file, list_directory, run_terminal
๐Ÿง  Reasoning Multi-step orchestration, error recovery

Quick Start

Browser (wllama/WebAssembly)

import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);

llama.cpp

./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6

Python

from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])

Output Format

MIMI Pro generates structured tool calls:

<tool_call>
{"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
</tool_call>

Multi-tool chains for complex tasks:

<tool_call>
{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
</tool_call>

<tool_call>
{"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
</tool_call>

The MIMI Model Family

Model Parameters Size Target Device Status
MIMI Nano 0.6B ~400 MB Any device, IoT ๐Ÿ”œ Coming
MIMI Small 1.7B ~1.0 GB Mobile & tablets ๐Ÿ”œ Coming
MIMI Pro 4.02B 2.3 GB Desktop & laptop โœ… Available
MIMI Max 8B ~4.5 GB Workstations ๐Ÿ”œ Coming

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

Training Details

method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)

Why MIMI?

  • ๐Ÿ”’ Privacy First โ€” Your data never leaves your device. Period.
  • ๐Ÿ’ฐ Zero Cost โ€” No API keys, no subscriptions, no per-token billing.
  • โšก Fast โ€” Runs at native speed via WebAssembly, no server round-trips.
  • ๐ŸŒ Works Offline โ€” Once downloaded, no internet required.
  • ๐Ÿ”ง Tool Native โ€” Purpose-built for autonomous tool calling, not retrofitted.

Limitations

  • Optimized for tool calling โ€” for general chat, use the base model directly.
  • Context window: 4,096 tokens (training config). Base architecture supports 32K.
  • Requires ~3 GB RAM for inference in browser.
  • Q4_K_M quantization trades minimal quality for 3.5x size reduction.

About Mimi Tech AI

Mimi Tech AI builds on-device AI โ€” no cloud, no data leaks, full user control.

License

Apache 2.0 โ€” free for commercial and personal use.

Citation

@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}
Downloads last month
208
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MimiTechAI/mimi-pro

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(193)
this model

Evaluation results