LocoTrainer-4B-GGUF / README.md
FutureMa's picture
Update README.md
dc5e3d9 verified
---
library_name: transformers
license: mit
base_model: LocoreMind/LocoTrainer-4B
tags:
- code
- agent
- tool-calling
- distillation
- qwen3
- ms-swift
- gguf
- quantization
language:
- en
pipeline_tag: text-generation
---
# LocoTrainer-4B GGUF
GGUF quantized version of LocoTrainer-4B model for local inference.
## Model Information
- **Base Model**: [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
- **Distilled from**: Qwen3-Coder-Next
- **Training Method**: Knowledge Distillation (SFT)
- **Training Data**: 361,830 samples
- **Max Context**: 32,768 tokens
- **Framework**: MS-SWIFT
## Available Versions
| Version | Size | Speed | Quality | Recommended For |
|---------|------|-------|---------|-----------------|
| F16 | 8.3GB | Fast | Highest | Baseline/Reference |
| Q8_0 | 4.4GB | Fast | Very High | High-quality inference |
| Q5_K_M | 3.0GB | Medium | High | Balanced approach |
| Q4_K_M | 2.6GB | Fast | Medium | **Recommended** |
| Q3_K_M | 2.1GB | Very Fast | Medium | Resource-constrained |
## Quick Start
### Using llama.cpp
```bash
# Download model
wget https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF/resolve/main/LocoTrainer-4B-Q4_K_M.gguf
# Start server
./llama-server -m LocoTrainer-4B-Q4_K_M.gguf --port 8080 --ctx-size 32768
```
### Using LocoTrainer Framework
```bash
# Configure .env
export LOCOTRAINER_BASE_URL=http://localhost:8080/v1
export LOCOTRAINER_MODEL=LocoTrainer-4B
# Run
locotrainer run -q "What are the default LoRA settings in ms-swift?"
```
### Using llama-cpp-python
```python
from llama_cpp import Llama
llm = Llama(
model_path="LocoTrainer-4B-Q4_K_M.gguf",
n_gpu_layers=99,
n_ctx=32768,
)
response = llm(
"What is MS-SWIFT?",
max_tokens=512,
)
print(response["choices"][0]["text"])
```
## Performance Metrics
Tested on NVIDIA H100:
- **First Token Latency**: ~200-300ms
- **Subsequent Token Speed**: 50-100 tokens/sec
- **Memory Usage** (Q4_K_M): ~10-12GB
## Features
- 🎯 **MS-SWIFT Domain Expert**: Trained on MS-SWIFT documentation and codebase
- πŸ”§ **Tool Calling**: Supports Read, Grep, Glob, Bash, Write tools
- πŸ“Š **End-to-End Reports**: From question to complete markdown analysis report
- 🏠 **Local Deployment**: Fully offline, zero API cost
- πŸ“ **Long Context**: 32K tokens support
## Use Cases
- Codebase analysis and documentation generation
- MS-SWIFT framework Q&A
- Local AI agent deployment
- Offline inference applications
## License
MIT
## Acknowledgments
- [Qwen Team](https://huggingface.co/Qwen) - Base model
- [MS-SWIFT](https://github.com/modelscope/ms-swift) - Training framework
- [llama.cpp](https://github.com/ggml-org/llama.cpp) - GGUF quantization and inference
- [Anthropic](https://www.anthropic.com/) - Claude Code design inspiration
## Related Resources
- [Original Model](https://huggingface.co/LocoreMind/LocoTrainer-4B)
- [LocoTrainer Framework](https://github.com/LocoreMind/LocoTrainer)
- [llama.cpp Documentation](https://github.com/ggml-org/llama.cpp)