Create README.md

ed9b1b9 verified 3 days ago

13.1 kB

	---
	library_name: transformers
	license: mit
	base_model:
	- LocoreMind/LocoOperator-4B
	tags:
	- code
	- agent
	- tool-calling
	- distillation
	- qwen3
	- gguf
	- llama-cpp
	language:
	- en
	pipeline_tag: text-generation
	---
	# This is a static quantization of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), made by [SimplySara](https://huggingface.co/SimplySara)

	\| Model \| Size_GB \| BPW \| PPL_Q \| KLD_Mean \| KLD_Max \| Top_P_Match \|
	\|:----------------------------------\|----------:\|------:\|----------:\|-----------:\|----------:\|:--------------\|
	\| LocoOperator-4B-BF16.gguf \| 7.498 \| 16.01 \| 9.24309 \| -1.2e-05 \| 4e-06 \| 100.000% \|
	\| LocoOperator-4B-MXFP4_MOE.gguf \| 3.986 \| 8.51 \| 9.24606 \| 0.001835 \| 2.98238 \| 97.518% \|
	\| LocoOperator-4B-i1-MXFP4_MOE.gguf \| 3.986 \| 8.51 \| 9.24606 \| 0.001835 \| 2.98238 \| 97.518% \|
	\| LocoOperator-4B-Q8_0.gguf \| 3.986 \| 8.51 \| 9.24606 \| 0.001835 \| 2.98238 \| 97.518% \|
	\| LocoOperator-4B-i1-Q8_0.gguf \| 3.986 \| 8.51 \| 9.24606 \| 0.001835 \| 2.98238 \| 97.518% \|
	\| LocoOperator-4B-Q6_K.gguf \| 3.079 \| 6.58 \| 9.27926 \| 0.0068 \| 10.5686 \| 95.526% \|
	\| LocoOperator-4B-i1-Q6_K.gguf \| 3.079 \| 6.58 \| 9.295 \| 0.006075 \| 15.9945 \| 95.857% \|
	\| LocoOperator-4B-i1-Q5_1.gguf \| 2.841 \| 6.07 \| 9.28859 \| 0.01364 \| 2.98838 \| 94.135% \|
	\| LocoOperator-4B-Q5_1.gguf \| 2.841 \| 6.07 \| 9.43222 \| 0.022675 \| 16.3454 \| 93.161% \|
	\| LocoOperator-4B-Q5_K_M.gguf \| 2.691 \| 5.75 \| 9.35457 \| 0.017023 \| 12.3947 \| 93.635% \|
	\| LocoOperator-4B-i1-Q5_K_M.gguf \| 2.691 \| 5.75 \| 9.2965 \| 0.013153 \| 7.78613 \| 94.257% \|
	\| LocoOperator-4B-i1-Q5_0.gguf \| 2.636 \| 5.63 \| 9.42255 \| 0.019663 \| 17.94 \| 93.208% \|
	\| LocoOperator-4B-Q5_0.gguf \| 2.63 \| 5.62 \| 9.41521 \| 0.023403 \| 31.4019 \| 92.839% \|
	\| LocoOperator-4B-Q5_K_S.gguf \| 2.63 \| 5.62 \| 9.44087 \| 0.022119 \| 13.6483 \| 92.800% \|
	\| LocoOperator-4B-i1-Q5_K_S.gguf \| 2.63 \| 5.62 \| 9.28767 \| 0.014865 \| 7.65169 \| 93.702% \|
	\| LocoOperator-4B-Q4_1.gguf \| 2.418 \| 5.16 \| 9.66722 \| 0.074718 \| 15.0861 \| 87.757% \|
	\| LocoOperator-4B-i1-Q4_1.gguf \| 2.418 \| 5.16 \| 9.45293 \| 0.038707 \| 13.8444 \| 90.574% \|
	\| LocoOperator-4B-Q4_K_M.gguf \| 2.326 \| 4.97 \| 9.48239 \| 0.048236 \| 15.3105 \| 90.300% \|
	\| LocoOperator-4B-i1-Q4_K_M.gguf \| 2.326 \| 4.97 \| 9.48582 \| 0.03368 \| 13.551 \| 91.233% \|
	\| LocoOperator-4B-IQ4_NL.gguf \| 2.229 \| 4.76 \| 9.60891 \| 0.050173 \| 11.4324 \| 89.708% \|
	\| LocoOperator-4B-i1-Q4_K_S.gguf \| 2.22 \| 4.74 \| 9.47603 \| 0.039843 \| 10.0551 \| 90.557% \|
	\| LocoOperator-4B-Q4_K_S.gguf \| 2.22 \| 4.74 \| 9.80236 \| 0.068821 \| 15.209 \| 88.513% \|
	\| LocoOperator-4B-i1-IQ4_NL.gguf \| 2.218 \| 4.74 \| 9.50223 \| 0.039414 \| 8.18964 \| 90.573% \|
	\| LocoOperator-4B-i1-Q4_0.gguf \| 2.213 \| 4.73 \| 9.79026 \| 0.063915 \| 12.6928 \| 88.737% \|
	\| LocoOperator-4B-Q4_0.gguf \| 2.207 \| 4.71 \| 9.86629 \| 0.074527 \| 13.2501 \| 87.758% \|
	\| LocoOperator-4B-IQ4_XS.gguf \| 2.129 \| 4.55 \| 9.62193 \| 0.051911 \| 11.0682 \| 89.705% \|
	\| LocoOperator-4B-i1-IQ4_XS.gguf \| 2.115 \| 4.52 \| 9.49687 \| 0.040098 \| 7.03875 \| 90.402% \|
	\| LocoOperator-4B-Q3_K_L.gguf \| 2.086 \| 4.45 \| 10.2476 \| 0.121944 \| 27.0257 \| 84.146% \|
	\| LocoOperator-4B-i1-Q3_K_L.gguf \| 2.086 \| 4.45 \| 9.90811 \| 0.090874 \| 15.8122 \| 86.154% \|
	\| LocoOperator-4B-Q3_K_M.gguf \| 1.933 \| 4.13 \| 10.7021 \| 0.15788 \| 20.2044 \| 82.662% \|
	\| LocoOperator-4B-i1-Q3_K_M.gguf \| 1.933 \| 4.13 \| 9.98057 \| 0.102708 \| 16.8243 \| 85.354% \|
	\| LocoOperator-4B-i1-IQ3_M.gguf \| 1.828 \| 3.9 \| 10.1634 \| 0.137347 \| 14.6883 \| 83.180% \|
	\| LocoOperator-4B-IQ3_M.gguf \| 1.828 \| 3.9 \| 14.2539 \| 0.557713 \| 19.4397 \| 67.631% \|
	\| LocoOperator-4B-IQ3_S.gguf \| 1.769 \| 3.78 \| 15.0624 \| 0.619131 \| 20.122 \| 65.931% \|
	\| LocoOperator-4B-i1-IQ3_S.gguf \| 1.769 \| 3.78 \| 10.1755 \| 0.142066 \| 17.0028 \| 83.139% \|
	\| LocoOperator-4B-i1-Q3_K_S.gguf \| 1.757 \| 3.75 \| 10.8886 \| 0.171224 \| 28.3373 \| 82.133% \|
	\| LocoOperator-4B-Q3_K_S.gguf \| 1.757 \| 3.75 \| 11.5475 \| 0.237895 \| 30.6868 \| 79.412% \|
	\| LocoOperator-4B-i1-IQ3_XS.gguf \| 1.69 \| 3.61 \| 10.3629 \| 0.168783 \| 14.3358 \| 81.928% \|
	\| LocoOperator-4B-i1-Q2_K.gguf \| 1.555 \| 3.32 \| 12.1574 \| 0.328652 \| 18.6622 \| 75.570% \|
	\| LocoOperator-4B-i1-IQ3_XXS.gguf \| 1.555 \| 3.32 \| 11.2795 \| 0.263448 \| 25.251 \| 77.569% \|
	\| LocoOperator-4B-Q2_K.gguf \| 1.555 \| 3.32 \| 17.153 \| 0.713596 \| 16.3946 \| 64.880% \|
	\| LocoOperator-4B-i1-Q2_K_S.gguf \| 1.456 \| 3.11 \| 13.1709 \| 0.450125 \| 18.3826 \| 71.231% \|
	\| LocoOperator-4B-i1-IQ2_M.gguf \| 1.409 \| 3.01 \| 14.0857 \| 0.544764 \| 18.5618 \| 67.933% \|
	\| LocoOperator-4B-i1-IQ2_S.gguf \| 1.32 \| 2.82 \| 15.0717 \| 0.621189 \| 24.0981 \| 65.722% \|
	\| LocoOperator-4B-i1-IQ2_XS.gguf \| 1.261 \| 2.69 \| 16.8277 \| 0.750336 \| 19.2128 \| 63.162% \|
	\| LocoOperator-4B-i1-IQ2_XXS.gguf \| 1.161 \| 2.48 \| 27.5988 \| 1.32144 \| 14.6807 \| 52.522% \|
	\| LocoOperator-4B-i1-IQ1_M.gguf \| 1.05 \| 2.24 \| 49.0978 \| 1.9323 \| 16.5947 \| 44.067% \|
	\| LocoOperator-4B-i1-IQ1_S.gguf \| 0.983 \| 2.1 \| 139.951 \| 3.03274 \| 16.0947 \| 28.387% \|


	------

	<div align="center">
	<img src="assets/loco_operator.png" width="55%" alt="LocoOperator" />
	</div>

	<br>

	<div align="center">

	[![MODEL](https://img.shields.io/badge/Model-FFB300?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoOperator-4B)
	[![GGUF](https://img.shields.io/badge/GGUF-FF6F00?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoOperator-4B-GGUF)
	[![Blog](https://img.shields.io/badge/Blog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https://locoremind.com/blog/loco-operator)
	[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/LocoreMind/LocoOperator)
	[![Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&logoColor=white)](https://colab.research.google.com/github/LocoreMind/LocoOperator/blob/main/LocoOperator_4B.ipynb)

	</div>

	## Introduction

	LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop. Designed as a local sub agent, it runs via llama.cpp at zero API cost.

	\| \| LocoOperator-4B \|
	\|:--\|:--\|
	\| Base Model \| [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \|
	\| Teacher Model \| Qwen3-Coder-Next \|
	\| Training Method \| Full-parameter SFT (distillation) \|
	\| Training Data \| 170,356 multi-turn conversation samples \|
	\| Max Sequence Length \| 16,384 tokens \|
	\| Training Hardware \| 4x NVIDIA H200 141GB SXM5 \|
	\| Training Time \| ~25 hours \|
	\| Framework \| MS-SWIFT \|

	## Key Features

	- Tool-Calling Agent: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
	- 100% JSON Validity: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
	- Local Deployment: GGUF quantized, runs on Mac Studio via llama.cpp at zero API cost
	- Lightweight Explorer: 4B parameters, optimized for fast codebase search and navigation
	- Multi-Turn: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

	## Performance

	Evaluated on 65 multi-turn conversation samples from diverse open-source projects (scipy, fastapi, arrow, attrs, gevent, gunicorn, etc.), with labels generated by Qwen3-Coder-Next.

	### Core Metrics

	\| Metric \| Score \|
	\|:-------\|:-----:\|
	\| Tool Call Presence Alignment \| 100% (65/65) \|
	\| First Tool Type Match \| 65.6% (40/61) \|
	\| JSON Validity \| 100% (76/76) \|
	\| Argument Syntax Correctness \| 100% (76/76) \|

	The model perfectly learned when to use tools vs. when to respond with text (100% presence alignment). Tool type mismatches are between semantically similar tools (e.g. Grep vs Read) — different but often valid strategies.

	### Tool Distribution Comparison

	<div align="center">
	<img src="assets/tool_distribution.png" width="80%" alt="Tool Distribution Comparison" />
	</div>

	### JSON & Argument Syntax Correctness

	\| Model \| JSON Valid \| Argument Syntax Valid \|
	\|:------\|:---------:\|:--------------------:\|
	\| LocoOperator-4B \| 76/76 (100%) \| 76/76 (100%) \|
	\| Qwen3-Coder-Next (teacher) \| 89/89 (100%) \| 78/89 (87.6%) \|

	> LocoOperator-4B achieves perfect structured output. The teacher model has 11 tool calls with missing required arguments (empty `arguments: {}`).

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "LocoreMind/LocoOperator-4B"

	# load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# prepare the messages
	messages = [
	{
	"role": "system",
	"content": "You are a read-only codebase search specialist.\n\nCRITICAL CONSTRAINTS:\n1. STRICTLY READ-ONLY: You cannot create, edit, delete, move files, or run any state-changing commands. Use tools/bash ONLY for reading (e.g., ls, find, cat, grep).\n2. EFFICIENCY: Spawn multiple parallel tool calls for faster searching.\n3. OUTPUT RULES: \n - ALWAYS use absolute file paths.\n - STRICTLY NO EMOJIS in your response.\n - Output your final report directly. Do not use colons before tool calls.\n\nENV: Working directory is /Users/developer/workspace/code-analyzer (macOS, zsh)."
	},
	{
	"role": "user",
	"content": "Analyze the Black codebase at `/Users/developer/workspace/code-analyzer/projects/black`.\nFind and explain:\n1. How Black discovers config files.\n2. The exact search order for config files.\n3. Supported config file formats.\n4. Where this configuration discovery logic lives in the codebase.\n\nReturn a comprehensive answer with relevant code snippets and absolute file paths."
	}
	]

	# prepare the model input
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# conduct text completion
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512,
	)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

	content = tokenizer.decode(output_ids, skip_special_tokens=True)
	print(content)
	```

	## Local Deployment

	For GGUF quantized deployment with llama.cpp, hybrid proxy routing, and batch analysis pipelines, refer to our [GitHub repository](https://github.com/LocoreMind/LocoOperator).

	## Training Details

	\| Parameter \| Value \|
	\|:----------\|:------\|
	\| Base model \| Qwen3-4B-Instruct-2507 \|
	\| Teacher model \| Qwen3-Coder-Next \|
	\| Method \| Full-parameter SFT \|
	\| Training data \| 170,356 samples \|
	\| Hardware \| 4x NVIDIA H200 141GB SXM5 \|
	\| Parallelism \| DDP (no DeepSpeed) \|
	\| Precision \| BF16 \|
	\| Epochs \| 1 \|
	\| Batch size \| 2/GPU, gradient accumulation 4 (effective batch 32) \|
	\| Learning rate \| 2e-5, warmup ratio 0.03 \|
	\| Max sequence length \| 16,384 tokens \|
	\| Template \| qwen3_nothinking \|
	\| Framework \| MS-SWIFT \|
	\| Training time \| ~25 hours \|
	\| Checkpoint \| Step 2524 \|

	## Known Limitations

	- First-tool-type match is 65.6% — the model sometimes picks a different (but not necessarily wrong) tool than the teacher
	- Tends to under-generate parallel tool calls compared to the teacher (76 vs 89 total calls across 65 samples)
	- Preference for Bash over Read may indicate the model defaults to shell commands where file reads would be more appropriate
	- Evaluated on 65 samples only; larger-scale evaluation needed

	## License

	MIT

	## Acknowledgments

	- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
	- [MS-SWIFT](https://github.com/modelscope/ms-swift) for the training framework
	- [llama.cpp](https://github.com/ggerganov/llama.cpp) for efficient local inference
	- [Anthropic](https://www.anthropic.com/) for the Claude Code agent loop design that inspired this work