MimiTechAI commited on
Commit
29c5291
Β·
verified Β·
1 Parent(s): 8d272fa

Update Model Card: Add BFCL V4 scores, transparent benchmarking, V2 roadmap

Browse files
Files changed (1) hide show
  1. README.md +77 -69
README.md CHANGED
@@ -20,75 +20,89 @@ model-index:
20
  - name: MIMI Pro
21
  results:
22
  - task:
23
- type: text-generation
24
- name: Tool/Function Calling
 
 
 
25
  metrics:
26
  - type: accuracy
27
- value: 97.66
28
- name: Token Accuracy
 
29
  - type: accuracy
30
- value: 97.29
31
- name: Eval Accuracy
32
- - type: loss
33
- value: 0.084
34
- name: Training Loss
35
- library_name: transformers
 
36
  pipeline_tag: text-generation
37
  ---
38
 
39
  # MIMI Pro
40
 
41
- <p align="center">
42
- <img src="https://img.shields.io/badge/MIMI-Pro-black?style=for-the-badge&labelColor=000000" alt="MIMI Pro"/>
43
- <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
44
- <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
45
- <img src="https://img.shields.io/badge/Runs_In-Browser-purple?style=for-the-badge" alt="Browser"/>
46
- <img src="https://img.shields.io/badge/Cloud-Zero-red?style=for-the-badge" alt="Zero Cloud"/>
47
- </p>
48
-
49
- **MIMI Pro** is a 4-billion parameter AI agent model optimized for **structured tool calling and autonomous task execution** β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.
50
 
51
- Part of the **MIMI Model Family** by [Mimi Tech AI](https://mimitechai.com).
52
 
53
- > πŸ’‘ MIMI Pro achieves **97.7% tool-calling accuracy** while running completely locally. Your data never leaves your device.
54
 
55
  ## Performance
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  | Metric | Value |
58
- |--------|-------|
59
- | **Token Accuracy** | 97.66% |
60
- | **Eval Accuracy** | 97.29% |
61
- | **Training Loss** | 0.084 |
62
- | **Parameters** | 4.02 Billion |
63
- | **Quantized Size** | 2.3 GB (Q4_K_M) |
64
- | **Training Time** | 46 minutes |
65
- | **Training Hardware** | NVIDIA DGX Spark (Grace Blackwell) |
66
 
67
  ## Architecture
68
 
69
- MIMI Pro is built on the [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
70
 
71
  **Key Design Decisions:**
72
- - **ChatML format** with `<think>` reasoning blocks for chain-of-thought
73
- - **19 tool types** covering web search, code execution, file operations, browser automation, and deep research
74
- - **Multi-step chains** β€” the model plans and executes sequences of tools autonomously
75
- - **Error recovery** β€” trained on failure cases to self-correct
 
 
 
 
76
 
77
  ## Supported Tools
78
 
79
  | Category | Tools |
80
- |----------|-------|
81
- | 🌐 **Web** | `web_search`, `browse_url`, `browser_action` |
82
- | πŸ’» **Code** | `execute_python`, `create_file`, `edit_file` |
83
- | πŸ”¬ **Research** | `deep_research`, `generate_document` |
84
- | πŸ“ **System** | `read_file`, `list_directory`, `run_terminal` |
85
- | 🧠 **Reasoning** | Multi-step orchestration, error recovery |
86
 
87
  ## Quick Start
88
 
89
  ### Browser (wllama/WebAssembly)
90
 
91
- ```typescript
92
  import { Wllama } from '@anthropic-ai/wllama';
93
 
94
  const wllama = new Wllama(wasmPaths);
@@ -124,34 +138,20 @@ output = llm.create_chat_completion(messages=[
124
 
125
  ## Output Format
126
 
127
- MIMI Pro generates structured tool calls:
128
 
129
- ```xml
130
- <tool_call>
131
- {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
132
- </tool_call>
133
- ```
134
-
135
- Multi-tool chains for complex tasks:
136
-
137
- ```xml
138
- <tool_call>
139
- {"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
140
- </tool_call>
141
-
142
- <tool_call>
143
- {"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
144
- </tool_call>
145
  ```
146
 
147
  ## The MIMI Model Family
148
 
149
  | Model | Parameters | Size | Target Device | Status |
150
- |-------|-----------|------|---------------|--------|
151
- | **MIMI Nano** | 0.6B | ~400 MB | Any device, IoT | πŸ”œ Coming |
152
- | **MIMI Small** | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
153
- | **MIMI Pro** | 4.02B | 2.3 GB | Desktop & laptop | βœ… **Available** |
154
- | **MIMI Max** | 8B | ~4.5 GB | Workstations | πŸ”œ Coming |
155
 
156
  All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
157
 
@@ -178,19 +178,27 @@ hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
178
 
179
  ## Why MIMI?
180
 
181
- - **πŸ”’ Privacy First** β€” Your data never leaves your device. Period.
182
- - **πŸ’° Zero Cost** β€” No API keys, no subscriptions, no per-token billing.
183
- - **⚑ Fast** β€” Runs at native speed via WebAssembly, no server round-trips.
184
- - **🌍 Works Offline** β€” Once downloaded, no internet required.
185
- - **πŸ”§ Tool Native** β€” Purpose-built for autonomous tool calling, not retrofitted.
186
 
187
  ## Limitations
188
 
189
- - Optimized for tool calling β€” for general chat, use the base model directly.
 
190
  - Context window: 4,096 tokens (training config). Base architecture supports 32K.
191
  - Requires ~3 GB RAM for inference in browser.
192
  - Q4_K_M quantization trades minimal quality for 3.5x size reduction.
193
 
 
 
 
 
 
 
 
194
  ## About Mimi Tech AI
195
 
196
  [Mimi Tech AI](https://mimitechai.com) builds on-device AI β€” no cloud, no data leaks, full user control.
 
20
  - name: MIMI Pro
21
  results:
22
  - task:
23
+ type: function-calling
24
+ name: Tool Calling
25
+ dataset:
26
+ type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
27
+ name: BFCL V4
28
  metrics:
29
  - type: accuracy
30
+ value: 60.8
31
+ name: Simple Function Calling (Python)
32
+ verified: false
33
  - type: accuracy
34
+ value: 57.5
35
+ name: Multiple Sequential Calls
36
+ verified: false
37
+ - type: accuracy
38
+ value: 90.0
39
+ name: Irrelevance Detection
40
+ verified: false
41
  pipeline_tag: text-generation
42
  ---
43
 
44
  # MIMI Pro
45
 
46
+ MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.
 
 
 
 
 
 
 
 
47
 
48
+ Part of the MIMI Model Family by [Mimi Tech AI](https://mimitechai.com).
49
 
50
+ > **πŸ”¬ V1 β€” Experimental Release.** This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) may perform equally well or better with native `<tool_call>` prompting. V2 with official BFCL scores and Qwen3-native format support is in development.
51
 
52
  ## Performance
53
 
54
+ ### BFCL V4 Benchmark (Partial β€” Single-Turn, 20 samples/category)
55
+
56
+ | Category | MIMI Pro V1 | Base Qwen3-4B | Notes |
57
+ |---|---|---|---|
58
+ | Simple Python | 60.8% (full 400) | **80.0%** | Base outperforms |
59
+ | Simple Java | 21.0% (full 100) | **60.0%** | Base outperforms |
60
+ | Multiple (Sequential) | **57.5%** (full 200) | 75.0% | Base outperforms |
61
+ | Parallel | 2.0% (full 200) | **75.0%** | Fine-tune degraded |
62
+ | Irrelevance | ~90% | **100%** | Both strong |
63
+ | Live Simple | β€” | **90.0%** | Base only |
64
+
65
+ > ⚠️ **Important Context:** The previously reported "97.7% accuracy" was a **training validation metric** (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.
66
+
67
+ ### Training Metrics (Internal)
68
+
69
  | Metric | Value |
70
+ |---|---|
71
+ | Training Token Accuracy | 97.66% |
72
+ | Eval Token Accuracy | 97.29% |
73
+ | Training Loss | 0.084 |
74
+ | Parameters | 4.02 Billion |
75
+ | Quantized Size | 2.3 GB (Q4_K_M) |
 
 
76
 
77
  ## Architecture
78
 
79
+ MIMI Pro is built on [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
80
 
81
  **Key Design Decisions:**
82
+ - Custom tool-calling format optimized for the MIMI Agent browser environment
83
+ - 19 tool types covering web search, code execution, file operations, browser automation
84
+ - Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)
85
+
86
+ **Known Limitations of V1:**
87
+ - Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
88
+ - The custom `{"tool": ..., "parameters": ...}` format diverges from Qwen3's native `<tool_call>` format
89
+ - V2 will address these issues with conservative fine-tuning and Qwen3-native format support
90
 
91
  ## Supported Tools
92
 
93
  | Category | Tools |
94
+ |---|---|
95
+ | 🌐 Web | web_search, browse_url, browser_action |
96
+ | πŸ’» Code | execute_python, create_file, edit_file |
97
+ | πŸ”¬ Research | deep_research, generate_document |
98
+ | πŸ“ System | read_file, list_directory, run_terminal |
99
+ | 🧠 Reasoning | Multi-step orchestration |
100
 
101
  ## Quick Start
102
 
103
  ### Browser (wllama/WebAssembly)
104
 
105
+ ```javascript
106
  import { Wllama } from '@anthropic-ai/wllama';
107
 
108
  const wllama = new Wllama(wasmPaths);
 
138
 
139
  ## Output Format
140
 
141
+ MIMI Pro V1 uses a custom format (V2 will support Qwen3-native `<tool_call>` format):
142
 
143
+ ```json
144
+ {"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ```
146
 
147
  ## The MIMI Model Family
148
 
149
  | Model | Parameters | Size | Target Device | Status |
150
+ |---|---|---|---|---|
151
+ | MIMI Nano | 0.6B | ~400 MB | Any device, IoT | πŸ”œ Coming |
152
+ | MIMI Small | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
153
+ | **MIMI Pro** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **βœ… Available** |
154
+ | MIMI Max | 8B | ~4.5 GB | Workstations | πŸ”œ Coming |
155
 
156
  All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
157
 
 
178
 
179
  ## Why MIMI?
180
 
181
+ - πŸ”’ **Privacy First** β€” Your data never leaves your device. Period.
182
+ - πŸ’° **Zero Cost** β€” No API keys, no subscriptions, no per-token billing.
183
+ - ⚑ **Fast** β€” Runs at native speed via WebAssembly, no server round-trips.
184
+ - 🌍 **Works Offline** β€” Once downloaded, no internet required.
185
+ - πŸ”§ **Tool Native** β€” Purpose-built for autonomous tool calling.
186
 
187
  ## Limitations
188
 
189
+ - V1 uses a custom tool-calling format (not Qwen3-native `<tool_call>`)
190
+ - Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
191
  - Context window: 4,096 tokens (training config). Base architecture supports 32K.
192
  - Requires ~3 GB RAM for inference in browser.
193
  - Q4_K_M quantization trades minimal quality for 3.5x size reduction.
194
 
195
+ ## Roadmap
196
+
197
+ - [x] **V1** β€” Custom format, 19 tools, browser-optimized (current release)
198
+ - [ ] **V2** β€” Qwen3-native `<tool_call>` format, official BFCL V4 scores, conservative fine-tuning
199
+ - [ ] **Model Family** β€” Nano (0.6B), Small (1.7B), Max (8B) releases
200
+ - [ ] **Multi-Turn** β€” Agentic conversation chains with tool result feedback
201
+
202
  ## About Mimi Tech AI
203
 
204
  [Mimi Tech AI](https://mimitechai.com) builds on-device AI β€” no cloud, no data leaks, full user control.