Instructions to use MimiTechAI/mimi-pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MimiTechAI/mimi-pro with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MimiTechAI/mimi-pro",
	filename="mimi-qwen3-4b-q4km.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MimiTechAI/mimi-pro with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MimiTechAI/mimi-pro
# Run inference directly in the terminal:
llama-cli -hf MimiTechAI/mimi-pro

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MimiTechAI/mimi-pro
# Run inference directly in the terminal:
llama-cli -hf MimiTechAI/mimi-pro

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MimiTechAI/mimi-pro
# Run inference directly in the terminal:
./llama-cli -hf MimiTechAI/mimi-pro

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MimiTechAI/mimi-pro
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MimiTechAI/mimi-pro

Use Docker

docker model run hf.co/MimiTechAI/mimi-pro

LM Studio
Jan

vLLM

How to use MimiTechAI/mimi-pro with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MimiTechAI/mimi-pro"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MimiTechAI/mimi-pro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MimiTechAI/mimi-pro

Ollama
How to use MimiTechAI/mimi-pro with Ollama:
```
ollama run hf.co/MimiTechAI/mimi-pro
```

Unsloth Studio new

How to use MimiTechAI/mimi-pro with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MimiTechAI/mimi-pro to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MimiTechAI/mimi-pro to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MimiTechAI/mimi-pro to start chatting

Pi new

How to use MimiTechAI/mimi-pro with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MimiTechAI/mimi-pro

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MimiTechAI/mimi-pro"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MimiTechAI/mimi-pro with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MimiTechAI/mimi-pro

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MimiTechAI/mimi-pro

Run Hermes

hermes

Docker Model Runner
How to use MimiTechAI/mimi-pro with Docker Model Runner:
```
docker model run hf.co/MimiTechAI/mimi-pro
```

Lemonade

How to use MimiTechAI/mimi-pro with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MimiTechAI/mimi-pro

Run and chat with the model

lemonade run user.mimi-pro-{{QUANT_TAG}}

List all available models

lemonade list

MimiTechAI commited on Mar 3

Commit

b352940

verified ·

1 Parent(s): 18f74f8

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +221 -0

README.md ADDED Viewed

	@@ -0,0 +1,221 @@

+---
+license: apache-2.0
+language:
+  - en
+  - de
+base_model: Qwen/Qwen3-4B
+tags:
+  - tool-calling
+  - function-calling
+  - agent
+  - qwen3
+  - gguf
+  - fine-tuned
+  - wllama
+  - browser-inference
+  - on-device-ai
+  - mimi-agent
+model-index:
+  - name: mimi-qwen3-4b-tool-calling
+    results:
+      - task:
+          type: text-generation
+          name: Tool/Function Calling
+        metrics:
+          - type: accuracy
+            value: 97.66
+            name: Token Accuracy
+          - type: accuracy
+            value: 97.29
+            name: Eval Accuracy
+          - type: loss
+            value: 0.084
+            name: Training Loss
+datasets:
+  - MimiTechAI/mimi-tool-calling-v3
+library_name: transformers
+pipeline_tag: text-generation
+---
+# MIMI Qwen3-4B Tool Calling
+<p align="center">
+  <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
+  <img src="https://img.shields.io/badge/Quantization-Q4__K__M-blue?style=for-the-badge" alt="Quantization"/>
+  <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
+  <img src="https://img.shields.io/badge/Inference-Browser%20(WASM)-purple?style=for-the-badge" alt="Browser"/>
+</p>
+A fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) optimized for **structured tool calling and function invocation** — designed to run entirely in the browser via WebAssembly (wllama/llama.cpp).
+Built by [Mimi Tech AI](https://mimitechai.com) for the [MIMI Agent](https://github.com/MimiTechAi/mimi-website) — a fully local, privacy-first AI agent that runs on-device with zero cloud dependencies.
+## Key Results
+| Metric | Value |
+|--------|-------|
+| **Token Accuracy** | 97.66% |
+| **Eval Accuracy** | 97.29% |
+| **Training Loss** | 0.084 |
+| **Training Time** | 46 minutes |
+| **Hardware** | NVIDIA DGX Spark (GB10, Grace Blackwell) |
+## Model Details
+- **Base Model:** [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) (4.02B parameters)
+- **Fine-Tuning Method:** LoRA (PEFT) via [Unsloth](https://github.com/unslothai/unsloth)
+- **LoRA Config:** rank=64, alpha=128, dropout=0.05
+- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+- **Quantization:** GGUF Q4_K_M (4.95 bits per weight)
+- **Format:** ChatML with `<think>` reasoning blocks
+- **Languages:** English (primary), German
+## Training Data
+1,610 high-quality examples covering 19 tool types:
+| Category | Tools | Examples |
+|----------|-------|----------|
+| **Web** | `web_search`, `browse_url`, `browser_action` | Search queries, URL extraction, DOM interaction |
+| **Code** | `execute_python`, `create_file`, `edit_file` | Code generation, file manipulation |
+| **Research** | `deep_research`, `generate_document` | Multi-source analysis, report generation |
+| **System** | `read_file`, `list_directory`, `run_terminal` | File I/O, system commands |
+| **Reasoning** | Multi-step chains | Tool orchestration, error recovery |
+Each example includes structured tool calls in JSON format with parameter validation and multi-turn conversations.
+## Usage
+### Browser (wllama — recommended)
+```typescript
+import { Wllama } from '@anthropic-ai/wllama';
+const wllama = new Wllama({
+  'single-thread/wllama.wasm': '/wllama/single-thread/wllama.wasm',
+  'multi-thread/wllama.wasm': '/wllama/multi-thread/wllama.wasm',
+});
+await wllama.loadModelFromUrl(
+  'https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling/resolve/main/mimi-qwen3-4b-q4km.gguf',
+  { n_ctx: 4096, n_threads: 4 }
+);
+const response = await wllama.createChatCompletion([
+  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
+  { role: 'user', content: 'Search for the latest AI news' }
+]);
+```
+### llama.cpp (CLI)
+```bash
+./llama-cli -m mimi-qwen3-4b-q4km.gguf \
+  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
+  -n 512 --temp 0.6 --top-p 0.95
+```
+### Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
+output = llm.create_chat_completion(messages=[
+    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
+    {"role": "user", "content": "Search for the latest AI news"}
+])
+```
+## Expected Output Format
+The model generates structured tool calls:
+```json
+<tool_call>
+{"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
+</tool_call>
+```
+Multi-tool chains are supported:
+```json
+<tool_call>
+{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specs"}}
+</tool_call>
+<tool_call>
+{"name": "browse_url", "arguments": {"url": "https://nvidia.com/dgx-spark"}}
+</tool_call>
+```
+## LoRA Hyperparameters
+```yaml
+base_model: Qwen/Qwen3-4B
+lora_rank: 64
+lora_alpha: 128
+lora_dropout: 0.05
+target_modules:
+  - q_proj
+  - k_proj
+  - v_proj
+  - o_proj
+  - gate_proj
+  - up_proj
+  - down_proj
+learning_rate: 2.0e-04
+lr_scheduler: linear
+warmup_steps: 5
+epochs: 3
+batch_size: 2
+gradient_accumulation_steps: 4
+effective_batch_size: 8
+max_seq_length: 2048
+optimizer: adamw_8bit
+weight_decay: 0.01
+bf16: true
+gradient_checkpointing: true
+packing: true
+```
+## MIMI Agent Model Family
+| Model | Parameters | Size (GGUF Q4_K_M) | Use Case | Status |
+|-------|-----------|---------------------|----------|--------|
+| mimi-qwen3-0.6b-tool-calling | 0.6B | ~400 MB | Ultra-lightweight, any device | 🔜 Coming |
+| mimi-qwen3-1.7b-tool-calling | 1.7B | ~1.0 GB | Mobile & tablets | 🔜 Coming |
+| **mimi-qwen3-4b-tool-calling** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **✅ Released** |
+| mimi-qwen3-8b-tool-calling | 8B | ~4.5 GB | Power users | 🔜 Coming |
+## Limitations
+- **Optimized for tool calling** — not a general-purpose chat model. For open-ended conversations, use the base Qwen3-4B.
+- **Context window:** 4,096 tokens (inherited from training config). Base model supports up to 32K.
+- **Quantization trade-offs:** Q4_K_M reduces quality slightly vs F16. For maximum accuracy, use the full-precision LoRA adapter.
+- **Browser memory:** Requires ~3 GB RAM for inference. Devices with <4 GB available memory may experience issues.
+## About Mimi Tech AI
+[Mimi Tech AI](https://mimitechai.com) builds on-device AI solutions — no cloud, no data leaks, full user control.
+- 🌐 [Website](https://mimitechai.com)
+- 🐙 [GitHub](https://github.com/MimiTechAi)
+- 💼 [LinkedIn](https://linkedin.com/company/mimitechai)
+- 🟢 Member of the [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/)
+## License
+This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the base Qwen3-4B license.
+## Citation
+```bibtex
+@misc{mimitechai2026mimi,
+  title={MIMI Qwen3-4B Tool Calling: Fine-Tuned Small Language Model for Browser-Based Agent Tool Invocation},
+  author={Bemler, Michael and Soppa, Michael},
+  year={2026},
+  publisher={Mimi Tech AI},
+  url={https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling}
+}
+```