---
library_name: transformers
license: mit
base_model: LocoreMind/LocoTrainer-4B
tags:
  - code
  - agent
  - tool-calling
  - distillation
  - qwen3
  - ms-swift
  - gguf
  - quantization
language:
  - en
pipeline_tag: text-generation
---

# LocoTrainer-4B GGUF

GGUF quantized version of LocoTrainer-4B model for local inference.

## Model Information

- **Base Model**: [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
- **Distilled from**: Qwen3-Coder-Next
- **Training Method**: Knowledge Distillation (SFT)
- **Training Data**: 361,830 samples
- **Max Context**: 32,768 tokens
- **Framework**: MS-SWIFT

## Available Versions

| Version | Size | Speed | Quality | Recommended For |
|---------|------|-------|---------|-----------------|
| F16 | 8.3GB | Fast | Highest | Baseline/Reference |
| Q8_0 | 4.4GB | Fast | Very High | High-quality inference |
| Q5_K_M | 3.0GB | Medium | High | Balanced approach |
| Q4_K_M | 2.6GB | Fast | Medium | **Recommended** |
| Q3_K_M | 2.1GB | Very Fast | Medium | Resource-constrained |

## Quick Start

### Using llama.cpp

```bash
# Download model
wget https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF/resolve/main/LocoTrainer-4B-Q4_K_M.gguf

# Start server
./llama-server -m LocoTrainer-4B-Q4_K_M.gguf --port 8080 --ctx-size 32768
```

### Using LocoTrainer Framework

```bash
# Configure .env
export LOCOTRAINER_BASE_URL=http://localhost:8080/v1
export LOCOTRAINER_MODEL=LocoTrainer-4B

# Run
locotrainer run -q "What are the default LoRA settings in ms-swift?"
```

### Using llama-cpp-python

```python
from llama_cpp import Llama

llm = Llama(
    model_path="LocoTrainer-4B-Q4_K_M.gguf",
    n_gpu_layers=99,
    n_ctx=32768,
)

response = llm(
    "What is MS-SWIFT?",
    max_tokens=512,
)
print(response["choices"][0]["text"])
```

## Performance Metrics

Tested on NVIDIA H100:

- **First Token Latency**: ~200-300ms
- **Subsequent Token Speed**: 50-100 tokens/sec
- **Memory Usage** (Q4_K_M): ~10-12GB

## Features

- 🎯 **MS-SWIFT Domain Expert**: Trained on MS-SWIFT documentation and codebase
- 🔧 **Tool Calling**: Supports Read, Grep, Glob, Bash, Write tools
- 📊 **End-to-End Reports**: From question to complete markdown analysis report
- 🏠 **Local Deployment**: Fully offline, zero API cost
- 📏 **Long Context**: 32K tokens support

## Use Cases

- Codebase analysis and documentation generation
- MS-SWIFT framework Q&A
- Local AI agent deployment
- Offline inference applications

## License

MIT

## Acknowledgments

- [Qwen Team](https://huggingface.co/Qwen) - Base model
- [MS-SWIFT](https://github.com/modelscope/ms-swift) - Training framework
- [llama.cpp](https://github.com/ggml-org/llama.cpp) - GGUF quantization and inference
- [Anthropic](https://www.anthropic.com/) - Claude Code design inspiration

## Related Resources

- [Original Model](https://huggingface.co/LocoreMind/LocoTrainer-4B)
- [LocoTrainer Framework](https://github.com/LocoreMind/LocoTrainer)
- [llama.cpp Documentation](https://github.com/ggml-org/llama.cpp)