--- library_name: transformers license: mit base_model: LocoreMind/LocoTrainer-4B tags: - code - agent - tool-calling - distillation - qwen3 - ms-swift - gguf - quantization language: - en pipeline_tag: text-generation --- # LocoTrainer-4B GGUF GGUF quantized version of LocoTrainer-4B model for local inference. ## Model Information - **Base Model**: [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) - **Distilled from**: Qwen3-Coder-Next - **Training Method**: Knowledge Distillation (SFT) - **Training Data**: 361,830 samples - **Max Context**: 32,768 tokens - **Framework**: MS-SWIFT ## Available Versions | Version | Size | Speed | Quality | Recommended For | |---------|------|-------|---------|-----------------| | F16 | 8.3GB | Fast | Highest | Baseline/Reference | | Q8_0 | 4.4GB | Fast | Very High | High-quality inference | | Q5_K_M | 3.0GB | Medium | High | Balanced approach | | Q4_K_M | 2.6GB | Fast | Medium | **Recommended** | | Q3_K_M | 2.1GB | Very Fast | Medium | Resource-constrained | ## Quick Start ### Using llama.cpp ```bash # Download model wget https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF/resolve/main/LocoTrainer-4B-Q4_K_M.gguf # Start server ./llama-server -m LocoTrainer-4B-Q4_K_M.gguf --port 8080 --ctx-size 32768 ``` ### Using LocoTrainer Framework ```bash # Configure .env export LOCOTRAINER_BASE_URL=http://localhost:8080/v1 export LOCOTRAINER_MODEL=LocoTrainer-4B # Run locotrainer run -q "What are the default LoRA settings in ms-swift?" ``` ### Using llama-cpp-python ```python from llama_cpp import Llama llm = Llama( model_path="LocoTrainer-4B-Q4_K_M.gguf", n_gpu_layers=99, n_ctx=32768, ) response = llm( "What is MS-SWIFT?", max_tokens=512, ) print(response["choices"][0]["text"]) ``` ## Performance Metrics Tested on NVIDIA H100: - **First Token Latency**: ~200-300ms - **Subsequent Token Speed**: 50-100 tokens/sec - **Memory Usage** (Q4_K_M): ~10-12GB ## Features - 🎯 **MS-SWIFT Domain Expert**: Trained on MS-SWIFT documentation and codebase - 🔧 **Tool Calling**: Supports Read, Grep, Glob, Bash, Write tools - 📊 **End-to-End Reports**: From question to complete markdown analysis report - 🏠 **Local Deployment**: Fully offline, zero API cost - 📏 **Long Context**: 32K tokens support ## Use Cases - Codebase analysis and documentation generation - MS-SWIFT framework Q&A - Local AI agent deployment - Offline inference applications ## License MIT ## Acknowledgments - [Qwen Team](https://huggingface.co/Qwen) - Base model - [MS-SWIFT](https://github.com/modelscope/ms-swift) - Training framework - [llama.cpp](https://github.com/ggml-org/llama.cpp) - GGUF quantization and inference - [Anthropic](https://www.anthropic.com/) - Claude Code design inspiration ## Related Resources - [Original Model](https://huggingface.co/LocoreMind/LocoTrainer-4B) - [LocoTrainer Framework](https://github.com/LocoreMind/LocoTrainer) - [llama.cpp Documentation](https://github.com/ggml-org/llama.cpp)