Upload README.md with huggingface_hub

45cd4bd verified 16 days ago

3.25 kB

language:
  - en
license: mit
tags:
  - mlx
  - qwen3
  - agent
  - tool-calling
  - code
  - 4-bit
  - quantized
base_model: LocoreMind/LocoOperator-4B
pipeline_tag: text-generation
library_name: mlx

LocoOperator-4B — MLX 4-bit Quantized

This is a 4-bit quantized MLX version of LocoreMind/LocoOperator-4B, converted for efficient inference on Apple Silicon using MLX.

Model Overview

Attribute	Value
Original Model	LocoreMind/LocoOperator-4B
Architecture	Qwen3 (4B parameters)
Quantization	4-bit (MLX)
Base Model	Qwen3-4B-Instruct-2507
Teacher Model	Qwen3-Coder-Next
Training Method	Full-parameter SFT (distillation from 170K samples)
Max Sequence Length	16,384 tokens
License	MIT

About LocoOperator-4B

LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop.

Key Features

Tool-Calling Agent: Generates structured <tool_call> JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
100% JSON Validity: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
Multi-Turn: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

Performance

Metric	Score
Tool Call Presence Alignment	100% (65/65)
First Tool Type Match	65.6% (40/61)
JSON Validity	100% (76/76)
Argument Syntax Correctness	100% (76/76)

Usage with MLX

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("DJLougen/LocoOperator-4B-MLX-4bit")

messages = [
    {
        "role": "system",
        "content": "You are a read-only codebase search specialist."
    },
    {
        "role": "user",
        "content": "Analyze the project structure at /workspace/myproject and explain the architecture."
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Other Quantizations

Variant	Link
MLX 4-bit	This repo
MLX 6-bit	DJLougen/LocoOperator-4B-MLX-6bit
MLX 8-bit	DJLougen/LocoOperator-4B-MLX-8bit
GGUF	LocoreMind/LocoOperator-4B-GGUF
Full Weights	LocoreMind/LocoOperator-4B

Acknowledgments

LocoreMind for the original LocoOperator-4B model
Qwen Team for the Qwen3-4B-Instruct-2507 base model
Apple MLX Team for the MLX framework