Instructions to use LocoreMind/LocoTrainer-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LocoreMind/LocoTrainer-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LocoreMind/LocoTrainer-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("LocoreMind/LocoTrainer-4B")
model = AutoModelForMultimodalLM.from_pretrained("LocoreMind/LocoTrainer-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LocoreMind/LocoTrainer-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LocoreMind/LocoTrainer-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocoreMind/LocoTrainer-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LocoreMind/LocoTrainer-4B

SGLang

How to use LocoreMind/LocoTrainer-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LocoreMind/LocoTrainer-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocoreMind/LocoTrainer-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LocoreMind/LocoTrainer-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocoreMind/LocoTrainer-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LocoreMind/LocoTrainer-4B with Docker Model Runner:
```
docker model run hf.co/LocoreMind/LocoTrainer-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Introduction

LocoTrainer-4B is a 4B-parameter MS-SWIFT domain expert agent trained via knowledge distillation from Qwen3-Coder-Next. Unlike general-purpose code agents, it combines multi-turn tool-calling with deep MS-SWIFT framework knowledge — enabling it to analyze codebases and generate comprehensive markdown reports without a separate reasoning model.

Demo

LocoTrainer analyzing MS-SWIFT codebase with LocoTrainer-4B model via vLLM

	LocoTrainer-4B
Base Model	Qwen3-4B-Instruct-2507
Teacher Model	Qwen3-Coder-Next
Training Method	Full-parameter SFT (distillation)
Training Data	361,830 samples (agent trajectory + MS-SWIFT knowledge + project paths)
Max Sequence Length	32,768 tokens
Training Hardware	8x NVIDIA H100 80GB
Training Time	~25 hours
Framework	MS-SWIFT

Key Features

MS-SWIFT Domain Expert: Trained on MS-SWIFT documentation, CLI parameters, and project structure paths — answers framework questions accurately
Tool-Calling Agent: Generates structured <tool_call> JSON for Read, Grep, Glob, Bash, and Write tools
End-to-End Reports: From a single question to a complete, well-structured markdown analysis report
Long Context: 32K training covers 90% of long-context analysis scenarios
Local Deployment: GGUF quantized version available for zero API cost inference

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LocoreMind/LocoTrainer-4B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "You are Claude Code, Anthropic's official CLI for Claude.\n\nYou are an interactive agent that helps users with software engineering tasks.\n\nCRITICAL CONSTRAINTS:\n1. ALWAYS use absolute file paths in tool calls.\n2. EFFICIENCY: Use multiple tool calls to explore the codebase.\n3. OUTPUT: Save your findings as a well-structured markdown document.\n\nENV: Working directory is /Users/developer/workspace (macOS, zsh)."
    },
    {
        "role": "user",
        "content": "What are the default LoRA settings in ms-swift?\n\nAnalyze the codebase at /Users/developer/workspace/ms-swift and save your findings as a well-structured markdown document to /Users/developer/workspace/output/output.md."
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

LocoTrainer Framework

LocoTrainer-4B is designed to run inside the LocoTrainer agent framework, which handles the full agent loop — tool execution, multi-turn conversation, and report generation.

pip install locotrainer

locotrainer run -q "What are the default LoRA settings in ms-swift?"
# → output/output.md

For full setup and usage, refer to the GitHub repository.

Training Details

Parameter	Value
Base model	Qwen3-4B-Instruct-2507
Teacher model	Qwen3-Coder-Next
Method	Full-parameter SFT
Training data	361,830 samples
Data composition	Agent trajectory + MS-SWIFT knowledge + project structure paths
Hardware	8x NVIDIA H100 80GB
DeepSpeed	ZeRO-2
Precision	BF16
Epochs	1
Max sequence length	32,768 tokens
Attention	Flash Attention 2
Kernel optimization	Liger Kernel
Learning rate	1e-5, warmup ratio 0.05
Batch size	1/GPU, gradient accumulation 4 (effective batch 32)
Template	qwen3_nothinking
Framework	MS-SWIFT
Training time	~25 hours

Known Limitations

Specialized for MS-SWIFT; performance on unrelated codebases is untested
4B parameters — complex multi-hop reasoning may require a larger model
MS-SWIFT project structure knowledge reflects the training data snapshot; may drift as the framework evolves

License

MIT

Acknowledgments

Qwen Team for the Qwen3-4B-Instruct-2507 base model
MS-SWIFT for the training framework and the codebase this model specializes in
llama.cpp for efficient local inference
Anthropic for the Claude Code agent loop design that inspired this work

Downloads last month: 33

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for LocoreMind/LocoTrainer-4B

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1759)

this model

Quantizations

3 models