LocoTrainer-4B GGUF

GGUF quantized version of LocoTrainer-4B model for local inference.

Model Information

  • Base Model: Qwen3-4B-Instruct-2507
  • Distilled from: Qwen3-Coder-Next
  • Training Method: Knowledge Distillation (SFT)
  • Training Data: 361,830 samples
  • Max Context: 32,768 tokens
  • Framework: MS-SWIFT

Available Versions

Version Size Speed Quality Recommended For
F16 8.3GB Fast Highest Baseline/Reference
Q8_0 4.4GB Fast Very High High-quality inference
Q5_K_M 3.0GB Medium High Balanced approach
Q4_K_M 2.6GB Fast Medium Recommended
Q3_K_M 2.1GB Very Fast Medium Resource-constrained

Quick Start

Using llama.cpp

# Download model
wget https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF/resolve/main/LocoTrainer-4B-Q4_K_M.gguf

# Start server
./llama-server -m LocoTrainer-4B-Q4_K_M.gguf --port 8080 --ctx-size 32768

Using LocoTrainer Framework

# Configure .env
export LOCOTRAINER_BASE_URL=http://localhost:8080/v1
export LOCOTRAINER_MODEL=LocoTrainer-4B

# Run
locotrainer run -q "What are the default LoRA settings in ms-swift?"

Using llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="LocoTrainer-4B-Q4_K_M.gguf",
    n_gpu_layers=99,
    n_ctx=32768,
)

response = llm(
    "What is MS-SWIFT?",
    max_tokens=512,
)
print(response["choices"][0]["text"])

Performance Metrics

Tested on NVIDIA H100:

  • First Token Latency: ~200-300ms
  • Subsequent Token Speed: 50-100 tokens/sec
  • Memory Usage (Q4_K_M): ~10-12GB

Features

  • ๐ŸŽฏ MS-SWIFT Domain Expert: Trained on MS-SWIFT documentation and codebase
  • ๐Ÿ”ง Tool Calling: Supports Read, Grep, Glob, Bash, Write tools
  • ๐Ÿ“Š End-to-End Reports: From question to complete markdown analysis report
  • ๐Ÿ  Local Deployment: Fully offline, zero API cost
  • ๐Ÿ“ Long Context: 32K tokens support

Use Cases

  • Codebase analysis and documentation generation
  • MS-SWIFT framework Q&A
  • Local AI agent deployment
  • Offline inference applications

License

MIT

Acknowledgments

Related Resources

Downloads last month
244
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LocoreMind/LocoTrainer-4B-GGUF

Quantized
(3)
this model