mimi-pro / README.md

Upload README.md with huggingface_hub

8d272fa verified 3 days ago

6.99 kB

license: apache-2.0
language:
  - en
  - de
base_model: Qwen/Qwen3-4B
tags:
  - mimi
  - tool-calling
  - function-calling
  - agent
  - gguf
  - fine-tuned
  - wllama
  - browser-inference
  - on-device-ai
  - local-ai
  - privacy-first
model-index:
  - name: MIMI Pro
    results:
      - task:
          type: text-generation
          name: Tool/Function Calling
        metrics:
          - type: accuracy
            value: 97.66
            name: Token Accuracy
          - type: accuracy
            value: 97.29
            name: Eval Accuracy
          - type: loss
            value: 0.084
            name: Training Loss
library_name: transformers
pipeline_tag: text-generation

MIMI Pro

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution — designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by Mimi Tech AI.

💡 MIMI Pro achieves 97.7% tool-calling accuracy while running completely locally. Your data never leaves your device.

Performance

Metric	Value
Token Accuracy	97.66%
Eval Accuracy	97.29%
Training Loss	0.084
Parameters	4.02 Billion
Quantized Size	2.3 GB (Q4_K_M)
Training Time	46 minutes
Training Hardware	NVIDIA DGX Spark (Grace Blackwell)

Architecture

MIMI Pro is built on the Qwen3-4B architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.

Key Design Decisions:

ChatML format with <think> reasoning blocks for chain-of-thought
19 tool types covering web search, code execution, file operations, browser automation, and deep research
Multi-step chains — the model plans and executes sequences of tools autonomously
Error recovery — trained on failure cases to self-correct

Supported Tools

Category	Tools
🌐 Web	`web_search`, `browse_url`, `browser_action`
💻 Code	`execute_python`, `create_file`, `edit_file`
🔬 Research	`deep_research`, `generate_document`
📁 System	`read_file`, `list_directory`, `run_terminal`
🧠 Reasoning	Multi-step orchestration, error recovery

Quick Start

Browser (wllama/WebAssembly)

import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);

llama.cpp

./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6

Python

from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])

Output Format

MIMI Pro generates structured tool calls:

<tool_call>
{"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
</tool_call>

Multi-tool chains for complex tasks:

<tool_call>
{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
</tool_call>

<tool_call>
{"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
</tool_call>

The MIMI Model Family

Model	Parameters	Size	Target Device	Status
MIMI Nano	0.6B	~400 MB	Any device, IoT	🔜 Coming
MIMI Small	1.7B	~1.0 GB	Mobile & tablets	🔜 Coming
MIMI Pro	4.02B	2.3 GB	Desktop & laptop	✅ Available
MIMI Max	8B	~4.5 GB	Workstations	🔜 Coming

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

Training Details

method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)

Why MIMI?

🔒 Privacy First — Your data never leaves your device. Period.
💰 Zero Cost — No API keys, no subscriptions, no per-token billing.
⚡ Fast — Runs at native speed via WebAssembly, no server round-trips.
🌍 Works Offline — Once downloaded, no internet required.
🔧 Tool Native — Purpose-built for autonomous tool calling, not retrofitted.

Limitations

Optimized for tool calling — for general chat, use the base model directly.
Context window: 4,096 tokens (training config). Base architecture supports 32K.
Requires ~3 GB RAM for inference in browser.
Q4_K_M quantization trades minimal quality for 3.5x size reduction.

About Mimi Tech AI

Mimi Tech AI builds on-device AI — no cloud, no data leaks, full user control.

License

Apache 2.0 — free for commercial and personal use.

Citation

@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}