DJLougen
/

LocoOperator-4B-MLX-8bit

Text Generation

8-bit precision

Model card Files Files and versions

LocoOperator-4B-MLX-8bit / README.md

DJLougen's picture

Upload README.md with huggingface_hub

a508b6e verified about 1 month ago

|

history blame contribute delete

3.25 kB

	---
	language:
	- en
	license: mit
	tags:
	- mlx
	- qwen3
	- agent
	- tool-calling
	- code
	- 8-bit
	- quantized
	base_model: LocoreMind/LocoOperator-4B
	pipeline_tag: text-generation
	library_name: mlx
	---

	# LocoOperator-4B — MLX 8-bit Quantized

	This is an 8-bit quantized MLX version of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), converted for efficient inference on Apple Silicon using [MLX](https://github.com/ml-explore/mlx).

	## Model Overview

	\| Attribute \| Value \|
	\|---\|---\|
	\| Original Model \| [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B) \|
	\| Architecture \| Qwen3 (4B parameters) \|
	\| Quantization \| 8-bit (MLX) \|
	\| Base Model \| Qwen3-4B-Instruct-2507 \|
	\| Teacher Model \| Qwen3-Coder-Next \|
	\| Training Method \| Full-parameter SFT (distillation from 170K samples) \|
	\| Max Sequence Length \| 16,384 tokens \|
	\| License \| MIT \|

	## About LocoOperator-4B

	LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop.

	### Key Features

	- Tool-Calling Agent: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
	- 100% JSON Validity: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
	- Multi-Turn: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

	### Performance

	\| Metric \| Score \|
	\|---\|---\|
	\| Tool Call Presence Alignment \| 100% (65/65) \|
	\| First Tool Type Match \| 65.6% (40/61) \|
	\| JSON Validity \| 100% (76/76) \|
	\| Argument Syntax Correctness \| 100% (76/76) \|

	## Usage with MLX

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("DJLougen/LocoOperator-4B-MLX-8bit")

	messages = [
	{
	"role": "system",
	"content": "You are a read-only codebase search specialist."
	},
	{
	"role": "user",
	"content": "Analyze the project structure at /workspace/myproject and explain the architecture."
	}
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
	print(response)
	```

	## Other Quantizations

	\| Variant \| Link \|
	\|---\|---\|
	\| MLX 4-bit \| [DJLougen/LocoOperator-4B-MLX-4bit](https://huggingface.co/DJLougen/LocoOperator-4B-MLX-4bit) \|
	\| MLX 6-bit \| [DJLougen/LocoOperator-4B-MLX-6bit](https://huggingface.co/DJLougen/LocoOperator-4B-MLX-6bit) \|
	\| MLX 8-bit \| This repo \|
	\| GGUF \| [LocoreMind/LocoOperator-4B-GGUF](https://huggingface.co/LocoreMind/LocoOperator-4B-GGUF) \|
	\| Full Weights \| [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B) \|

	## Acknowledgments

	- [LocoreMind](https://huggingface.co/LocoreMind) for the original LocoOperator-4B model
	- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
	- [Apple MLX Team](https://github.com/ml-explore/mlx) for the MLX framework