metadata
language:
- en
license: mit
tags:
- mlx
- qwen3
- agent
- tool-calling
- code
- 4-bit
- quantized
base_model: LocoreMind/LocoOperator-4B
pipeline_tag: text-generation
library_name: mlx
LocoOperator-4B — MLX 4-bit Quantized
This is a 4-bit quantized MLX version of LocoreMind/LocoOperator-4B, converted for efficient inference on Apple Silicon using MLX.
Model Overview
| Attribute | Value |
|---|---|
| Original Model | LocoreMind/LocoOperator-4B |
| Architecture | Qwen3 (4B parameters) |
| Quantization | 4-bit (MLX) |
| Base Model | Qwen3-4B-Instruct-2507 |
| Teacher Model | Qwen3-Coder-Next |
| Training Method | Full-parameter SFT (distillation from 170K samples) |
| Max Sequence Length | 16,384 tokens |
| License | MIT |
About LocoOperator-4B
LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop.
Key Features
- Tool-Calling Agent: Generates structured
<tool_call>JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation) - 100% JSON Validity: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
- Multi-Turn: Handles conversation depths of 3–33 messages with consistent tool-calling behavior
Performance
| Metric | Score |
|---|---|
| Tool Call Presence Alignment | 100% (65/65) |
| First Tool Type Match | 65.6% (40/61) |
| JSON Validity | 100% (76/76) |
| Argument Syntax Correctness | 100% (76/76) |
Usage with MLX
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DJLougen/LocoOperator-4B-MLX-4bit")
messages = [
{
"role": "system",
"content": "You are a read-only codebase search specialist."
},
{
"role": "user",
"content": "Analyze the project structure at /workspace/myproject and explain the architecture."
}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
Other Quantizations
| Variant | Link |
|---|---|
| MLX 4-bit | This repo |
| MLX 6-bit | DJLougen/LocoOperator-4B-MLX-6bit |
| MLX 8-bit | DJLougen/LocoOperator-4B-MLX-8bit |
| GGUF | LocoreMind/LocoOperator-4B-GGUF |
| Full Weights | LocoreMind/LocoOperator-4B |
Acknowledgments
- LocoreMind for the original LocoOperator-4B model
- Qwen Team for the Qwen3-4B-Instruct-2507 base model
- Apple MLX Team for the MLX framework