File size: 3,250 Bytes

---
language:
- en
license: mit
tags:
- mlx
- qwen3
- agent
- tool-calling
- code
- 8-bit
- quantized
base_model: LocoreMind/LocoOperator-4B
pipeline_tag: text-generation
library_name: mlx
---

# LocoOperator-4B — MLX 8-bit Quantized

This is an **8-bit quantized MLX** version of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), converted for efficient inference on Apple Silicon using [MLX](https://github.com/ml-explore/mlx).

## Model Overview

| Attribute | Value |
|---|---|
| **Original Model** | [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B) |
| **Architecture** | Qwen3 (4B parameters) |
| **Quantization** | 8-bit (MLX) |
| **Base Model** | Qwen3-4B-Instruct-2507 |
| **Teacher Model** | Qwen3-Coder-Next |
| **Training Method** | Full-parameter SFT (distillation from 170K samples) |
| **Max Sequence Length** | 16,384 tokens |
| **License** | MIT |

## About LocoOperator-4B

LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop.

### Key Features

- **Tool-Calling Agent**: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
- **100% JSON Validity**: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
- **Multi-Turn**: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

### Performance

| Metric | Score |
|---|---|
| Tool Call Presence Alignment | **100%** (65/65) |
| First Tool Type Match | **65.6%** (40/61) |
| JSON Validity | **100%** (76/76) |
| Argument Syntax Correctness | **100%** (76/76) |

## Usage with MLX

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/LocoOperator-4B-MLX-8bit")

messages = [
    {
        "role": "system",
        "content": "You are a read-only codebase search specialist."
    },
    {
        "role": "user",
        "content": "Analyze the project structure at /workspace/myproject and explain the architecture."
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
```

## Other Quantizations

| Variant | Link |
|---|---|
| MLX 4-bit | [DJLougen/LocoOperator-4B-MLX-4bit](https://huggingface.co/DJLougen/LocoOperator-4B-MLX-4bit) |
| MLX 6-bit | [DJLougen/LocoOperator-4B-MLX-6bit](https://huggingface.co/DJLougen/LocoOperator-4B-MLX-6bit) |
| MLX 8-bit | **This repo** |
| GGUF | [LocoreMind/LocoOperator-4B-GGUF](https://huggingface.co/LocoreMind/LocoOperator-4B-GGUF) |
| Full Weights | [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B) |

## Acknowledgments

- [LocoreMind](https://huggingface.co/LocoreMind) for the original LocoOperator-4B model
- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
- [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework