File size: 6,116 Bytes
d5a6ea0 cea5e91 d5a6ea0 cea5e91 d5a6ea0 ff05d27 d5a6ea0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
library_name: transformers
license: mit
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- code
- agent
- tool-calling
- distillation
- qwen3
- ms-swift
- codebase-analysis
language:
- en
pipeline_tag: text-generation
---
<div align="center">
<img src="assets/locotrainer.png" width="55%" alt="LocoTrainer" />
</div>
<br>
<div align="center">
[](https://pypi.org/project/locotrainer/)
[](https://huggingface.co/LocoreMind/LocoTrainer-4B)
[](https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF)
[](https://colab.research.google.com/github/LocoreMind/LocoTrainer/blob/main/LocoTrainer_4B.ipynb)
[](https://github.com/LocoreMind/LocoTrainer)
</div>
## Introduction
**LocoTrainer-4B** is a 4B-parameter MS-SWIFT domain expert agent trained via knowledge distillation from **Qwen3-Coder-Next**. Unlike general-purpose code agents, it combines multi-turn tool-calling with deep MS-SWIFT framework knowledge — enabling it to analyze codebases and generate comprehensive markdown reports without a separate reasoning model.
## Demo
<div align="center">
<img src="assets/demo.gif" width="90%" alt="LocoTrainer Demo" />
</div>
*LocoTrainer analyzing MS-SWIFT codebase with LocoTrainer-4B model via vLLM*
| | LocoTrainer-4B |
|:--|:--|
| **Base Model** | [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
| **Teacher Model** | Qwen3-Coder-Next |
| **Training Method** | Full-parameter SFT (distillation) |
| **Training Data** | 361,830 samples (agent trajectory + MS-SWIFT knowledge + project paths) |
| **Max Sequence Length** | 32,768 tokens |
| **Training Hardware** | 8x NVIDIA H100 80GB |
| **Training Time** | ~25 hours |
| **Framework** | MS-SWIFT |
## Key Features
- **MS-SWIFT Domain Expert**: Trained on MS-SWIFT documentation, CLI parameters, and project structure paths — answers framework questions accurately
- **Tool-Calling Agent**: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, and Write tools
- **End-to-End Reports**: From a single question to a complete, well-structured markdown analysis report
- **Long Context**: 32K training covers 90% of long-context analysis scenarios
- **Local Deployment**: GGUF quantized version available for zero API cost inference
## Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LocoreMind/LocoTrainer-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
messages = [
{
"role": "system",
"content": "You are Claude Code, Anthropic's official CLI for Claude.\n\nYou are an interactive agent that helps users with software engineering tasks.\n\nCRITICAL CONSTRAINTS:\n1. ALWAYS use absolute file paths in tool calls.\n2. EFFICIENCY: Use multiple tool calls to explore the codebase.\n3. OUTPUT: Save your findings as a well-structured markdown document.\n\nENV: Working directory is /Users/developer/workspace (macOS, zsh)."
},
{
"role": "user",
"content": "What are the default LoRA settings in ms-swift?\n\nAnalyze the codebase at /Users/developer/workspace/ms-swift and save your findings as a well-structured markdown document to /Users/developer/workspace/output/output.md."
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)
```
## LocoTrainer Framework
LocoTrainer-4B is designed to run inside the **LocoTrainer agent framework**, which handles the full agent loop — tool execution, multi-turn conversation, and report generation.
```bash
pip install locotrainer
locotrainer run -q "What are the default LoRA settings in ms-swift?"
# → output/output.md
```
For full setup and usage, refer to the [GitHub repository](https://github.com/LocoreMind/LocoTrainer).
## Training Details
| Parameter | Value |
|:----------|:------|
| Base model | Qwen3-4B-Instruct-2507 |
| Teacher model | Qwen3-Coder-Next |
| Method | Full-parameter SFT |
| Training data | 361,830 samples |
| Data composition | Agent trajectory + MS-SWIFT knowledge + project structure paths |
| Hardware | 8x NVIDIA H100 80GB |
| DeepSpeed | ZeRO-2 |
| Precision | BF16 |
| Epochs | 1 |
| Max sequence length | 32,768 tokens |
| Attention | Flash Attention 2 |
| Kernel optimization | Liger Kernel |
| Learning rate | 1e-5, warmup ratio 0.05 |
| Batch size | 1/GPU, gradient accumulation 4 (effective batch 32) |
| Template | qwen3_nothinking |
| Framework | MS-SWIFT |
| Training time | ~25 hours |
## Known Limitations
- Specialized for MS-SWIFT; performance on unrelated codebases is untested
- 4B parameters — complex multi-hop reasoning may require a larger model
- MS-SWIFT project structure knowledge reflects the training data snapshot; may drift as the framework evolves
## License
MIT
## Acknowledgments
- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
- [MS-SWIFT](https://github.com/modelscope/ms-swift) for the training framework and the codebase this model specializes in
- [llama.cpp](https://github.com/ggerganov/llama.cpp) for efficient local inference
- [Anthropic](https://www.anthropic.com/) for the Claude Code agent loop design that inspired this work
|