File size: 6,116 Bytes
d5a6ea0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cea5e91
d5a6ea0
 
cea5e91
d5a6ea0
 
 
 
 
 
 
 
ff05d27
 
 
 
 
 
 
 
d5a6ea0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
library_name: transformers
license: mit
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
  - code
  - agent
  - tool-calling
  - distillation
  - qwen3
  - ms-swift
  - codebase-analysis
language:
  - en
pipeline_tag: text-generation
---

<div align="center">
  <img src="assets/locotrainer.png" width="55%" alt="LocoTrainer" />
</div>

<br>

<div align="center">

[![PyPI](https://img.shields.io/badge/PyPI-3775A9?style=for-the-badge&logo=pypi&logoColor=white)](https://pypi.org/project/locotrainer/)
[![MODEL](https://img.shields.io/badge/Model-FFB300?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoTrainer-4B)
[![GGUF](https://img.shields.io/badge/GGUF-FF6F00?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF)
[![Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&logoColor=white)](https://colab.research.google.com/github/LocoreMind/LocoTrainer/blob/main/LocoTrainer_4B.ipynb)
[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/LocoreMind/LocoTrainer)

</div>

## Introduction

**LocoTrainer-4B** is a 4B-parameter MS-SWIFT domain expert agent trained via knowledge distillation from **Qwen3-Coder-Next**. Unlike general-purpose code agents, it combines multi-turn tool-calling with deep MS-SWIFT framework knowledge — enabling it to analyze codebases and generate comprehensive markdown reports without a separate reasoning model.

## Demo

<div align="center">
  <img src="assets/demo.gif" width="90%" alt="LocoTrainer Demo" />
</div>

*LocoTrainer analyzing MS-SWIFT codebase with LocoTrainer-4B model via vLLM*

|  | LocoTrainer-4B |
|:--|:--|
| **Base Model** | [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
| **Teacher Model** | Qwen3-Coder-Next |
| **Training Method** | Full-parameter SFT (distillation) |
| **Training Data** | 361,830 samples (agent trajectory + MS-SWIFT knowledge + project paths) |
| **Max Sequence Length** | 32,768 tokens |
| **Training Hardware** | 8x NVIDIA H100 80GB |
| **Training Time** | ~25 hours |
| **Framework** | MS-SWIFT |

## Key Features

- **MS-SWIFT Domain Expert**: Trained on MS-SWIFT documentation, CLI parameters, and project structure paths — answers framework questions accurately
- **Tool-Calling Agent**: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, and Write tools
- **End-to-End Reports**: From a single question to a complete, well-structured markdown analysis report
- **Long Context**: 32K training covers 90% of long-context analysis scenarios
- **Local Deployment**: GGUF quantized version available for zero API cost inference

## Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LocoreMind/LocoTrainer-4B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "You are Claude Code, Anthropic's official CLI for Claude.\n\nYou are an interactive agent that helps users with software engineering tasks.\n\nCRITICAL CONSTRAINTS:\n1. ALWAYS use absolute file paths in tool calls.\n2. EFFICIENCY: Use multiple tool calls to explore the codebase.\n3. OUTPUT: Save your findings as a well-structured markdown document.\n\nENV: Working directory is /Users/developer/workspace (macOS, zsh)."
    },
    {
        "role": "user",
        "content": "What are the default LoRA settings in ms-swift?\n\nAnalyze the codebase at /Users/developer/workspace/ms-swift and save your findings as a well-structured markdown document to /Users/developer/workspace/output/output.md."
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)
```

## LocoTrainer Framework

LocoTrainer-4B is designed to run inside the **LocoTrainer agent framework**, which handles the full agent loop — tool execution, multi-turn conversation, and report generation.

```bash
pip install locotrainer

locotrainer run -q "What are the default LoRA settings in ms-swift?"
# → output/output.md
```

For full setup and usage, refer to the [GitHub repository](https://github.com/LocoreMind/LocoTrainer).

## Training Details

| Parameter | Value |
|:----------|:------|
| Base model | Qwen3-4B-Instruct-2507 |
| Teacher model | Qwen3-Coder-Next |
| Method | Full-parameter SFT |
| Training data | 361,830 samples |
| Data composition | Agent trajectory + MS-SWIFT knowledge + project structure paths |
| Hardware | 8x NVIDIA H100 80GB |
| DeepSpeed | ZeRO-2 |
| Precision | BF16 |
| Epochs | 1 |
| Max sequence length | 32,768 tokens |
| Attention | Flash Attention 2 |
| Kernel optimization | Liger Kernel |
| Learning rate | 1e-5, warmup ratio 0.05 |
| Batch size | 1/GPU, gradient accumulation 4 (effective batch 32) |
| Template | qwen3_nothinking |
| Framework | MS-SWIFT |
| Training time | ~25 hours |

## Known Limitations

- Specialized for MS-SWIFT; performance on unrelated codebases is untested
- 4B parameters — complex multi-hop reasoning may require a larger model
- MS-SWIFT project structure knowledge reflects the training data snapshot; may drift as the framework evolves

## License

MIT

## Acknowledgments

- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
- [MS-SWIFT](https://github.com/modelscope/ms-swift) for the training framework and the codebase this model specializes in
- [llama.cpp](https://github.com/ggerganov/llama.cpp) for efficient local inference
- [Anthropic](https://www.anthropic.com/) for the Claude Code agent loop design that inspired this work