MimiTechAI commited on
Commit
b352940
Β·
verified Β·
1 Parent(s): 18f74f8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +221 -0
README.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - de
6
+ base_model: Qwen/Qwen3-4B
7
+ tags:
8
+ - tool-calling
9
+ - function-calling
10
+ - agent
11
+ - qwen3
12
+ - gguf
13
+ - fine-tuned
14
+ - wllama
15
+ - browser-inference
16
+ - on-device-ai
17
+ - mimi-agent
18
+ model-index:
19
+ - name: mimi-qwen3-4b-tool-calling
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Tool/Function Calling
24
+ metrics:
25
+ - type: accuracy
26
+ value: 97.66
27
+ name: Token Accuracy
28
+ - type: accuracy
29
+ value: 97.29
30
+ name: Eval Accuracy
31
+ - type: loss
32
+ value: 0.084
33
+ name: Training Loss
34
+ datasets:
35
+ - MimiTechAI/mimi-tool-calling-v3
36
+ library_name: transformers
37
+ pipeline_tag: text-generation
38
+ ---
39
+
40
+ # MIMI Qwen3-4B Tool Calling
41
+
42
+ <p align="center">
43
+ <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
44
+ <img src="https://img.shields.io/badge/Quantization-Q4__K__M-blue?style=for-the-badge" alt="Quantization"/>
45
+ <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
46
+ <img src="https://img.shields.io/badge/Inference-Browser%20(WASM)-purple?style=for-the-badge" alt="Browser"/>
47
+ </p>
48
+
49
+ A fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) optimized for **structured tool calling and function invocation** β€” designed to run entirely in the browser via WebAssembly (wllama/llama.cpp).
50
+
51
+ Built by [Mimi Tech AI](https://mimitechai.com) for the [MIMI Agent](https://github.com/MimiTechAi/mimi-website) β€” a fully local, privacy-first AI agent that runs on-device with zero cloud dependencies.
52
+
53
+ ## Key Results
54
+
55
+ | Metric | Value |
56
+ |--------|-------|
57
+ | **Token Accuracy** | 97.66% |
58
+ | **Eval Accuracy** | 97.29% |
59
+ | **Training Loss** | 0.084 |
60
+ | **Training Time** | 46 minutes |
61
+ | **Hardware** | NVIDIA DGX Spark (GB10, Grace Blackwell) |
62
+
63
+ ## Model Details
64
+
65
+ - **Base Model:** [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) (4.02B parameters)
66
+ - **Fine-Tuning Method:** LoRA (PEFT) via [Unsloth](https://github.com/unslothai/unsloth)
67
+ - **LoRA Config:** rank=64, alpha=128, dropout=0.05
68
+ - **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
69
+ - **Quantization:** GGUF Q4_K_M (4.95 bits per weight)
70
+ - **Format:** ChatML with `<think>` reasoning blocks
71
+ - **Languages:** English (primary), German
72
+
73
+ ## Training Data
74
+
75
+ 1,610 high-quality examples covering 19 tool types:
76
+
77
+ | Category | Tools | Examples |
78
+ |----------|-------|----------|
79
+ | **Web** | `web_search`, `browse_url`, `browser_action` | Search queries, URL extraction, DOM interaction |
80
+ | **Code** | `execute_python`, `create_file`, `edit_file` | Code generation, file manipulation |
81
+ | **Research** | `deep_research`, `generate_document` | Multi-source analysis, report generation |
82
+ | **System** | `read_file`, `list_directory`, `run_terminal` | File I/O, system commands |
83
+ | **Reasoning** | Multi-step chains | Tool orchestration, error recovery |
84
+
85
+ Each example includes structured tool calls in JSON format with parameter validation and multi-turn conversations.
86
+
87
+ ## Usage
88
+
89
+ ### Browser (wllama β€” recommended)
90
+
91
+ ```typescript
92
+ import { Wllama } from '@anthropic-ai/wllama';
93
+
94
+ const wllama = new Wllama({
95
+ 'single-thread/wllama.wasm': '/wllama/single-thread/wllama.wasm',
96
+ 'multi-thread/wllama.wasm': '/wllama/multi-thread/wllama.wasm',
97
+ });
98
+
99
+ await wllama.loadModelFromUrl(
100
+ 'https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling/resolve/main/mimi-qwen3-4b-q4km.gguf',
101
+ { n_ctx: 4096, n_threads: 4 }
102
+ );
103
+
104
+ const response = await wllama.createChatCompletion([
105
+ { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
106
+ { role: 'user', content: 'Search for the latest AI news' }
107
+ ]);
108
+ ```
109
+
110
+ ### llama.cpp (CLI)
111
+
112
+ ```bash
113
+ ./llama-cli -m mimi-qwen3-4b-q4km.gguf \
114
+ -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
115
+ -n 512 --temp 0.6 --top-p 0.95
116
+ ```
117
+
118
+ ### Python (llama-cpp-python)
119
+
120
+ ```python
121
+ from llama_cpp import Llama
122
+
123
+ llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
124
+ output = llm.create_chat_completion(messages=[
125
+ {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
126
+ {"role": "user", "content": "Search for the latest AI news"}
127
+ ])
128
+ ```
129
+
130
+ ## Expected Output Format
131
+
132
+ The model generates structured tool calls:
133
+
134
+ ```json
135
+ <tool_call>
136
+ {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
137
+ </tool_call>
138
+ ```
139
+
140
+ Multi-tool chains are supported:
141
+
142
+ ```json
143
+ <tool_call>
144
+ {"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specs"}}
145
+ </tool_call>
146
+
147
+ <tool_call>
148
+ {"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
149
+ </tool_call>
150
+ ```
151
+
152
+ ## LoRA Hyperparameters
153
+
154
+ ```yaml
155
+ base_model: Qwen/Qwen3-4B
156
+ lora_rank: 64
157
+ lora_alpha: 128
158
+ lora_dropout: 0.05
159
+ target_modules:
160
+ - q_proj
161
+ - k_proj
162
+ - v_proj
163
+ - o_proj
164
+ - gate_proj
165
+ - up_proj
166
+ - down_proj
167
+ learning_rate: 2.0e-04
168
+ lr_scheduler: linear
169
+ warmup_steps: 5
170
+ epochs: 3
171
+ batch_size: 2
172
+ gradient_accumulation_steps: 4
173
+ effective_batch_size: 8
174
+ max_seq_length: 2048
175
+ optimizer: adamw_8bit
176
+ weight_decay: 0.01
177
+ bf16: true
178
+ gradient_checkpointing: true
179
+ packing: true
180
+ ```
181
+
182
+ ## MIMI Agent Model Family
183
+
184
+ | Model | Parameters | Size (GGUF Q4_K_M) | Use Case | Status |
185
+ |-------|-----------|---------------------|----------|--------|
186
+ | mimi-qwen3-0.6b-tool-calling | 0.6B | ~400 MB | Ultra-lightweight, any device | πŸ”œ Coming |
187
+ | mimi-qwen3-1.7b-tool-calling | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
188
+ | **mimi-qwen3-4b-tool-calling** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **βœ… Released** |
189
+ | mimi-qwen3-8b-tool-calling | 8B | ~4.5 GB | Power users | πŸ”œ Coming |
190
+
191
+ ## Limitations
192
+
193
+ - **Optimized for tool calling** β€” not a general-purpose chat model. For open-ended conversations, use the base Qwen3-4B.
194
+ - **Context window:** 4,096 tokens (inherited from training config). Base model supports up to 32K.
195
+ - **Quantization trade-offs:** Q4_K_M reduces quality slightly vs F16. For maximum accuracy, use the full-precision LoRA adapter.
196
+ - **Browser memory:** Requires ~3 GB RAM for inference. Devices with <4 GB available memory may experience issues.
197
+
198
+ ## About Mimi Tech AI
199
+
200
+ [Mimi Tech AI](https://mimitechai.com) builds on-device AI solutions β€” no cloud, no data leaks, full user control.
201
+
202
+ - 🌐 [Website](https://mimitechai.com)
203
+ - πŸ™ [GitHub](https://github.com/MimiTechAi)
204
+ - πŸ’Ό [LinkedIn](https://linkedin.com/company/mimitechai)
205
+ - 🟒 Member of the [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/)
206
+
207
+ ## License
208
+
209
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the base Qwen3-4B license.
210
+
211
+ ## Citation
212
+
213
+ ```bibtex
214
+ @misc{mimitechai2026mimi,
215
+ title={MIMI Qwen3-4B Tool Calling: Fine-Tuned Small Language Model for Browser-Based Agent Tool Invocation},
216
+ author={Bemler, Michael and Soppa, Michael},
217
+ year={2026},
218
+ publisher={Mimi Tech AI},
219
+ url={https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling}
220
+ }
221
+ ```