Raymond GGUF
Qwen3-4B-Instruct-2507 fine-tuned with LoRA on synthetic chat data distilled via Claude Sonnet. Quantized to Q4_K_M for local inference via Ollama.
Usage
# Download
huggingface-cli download RuimengLiu/raymond-gguf raymond-q4_k_m.gguf --local-dir .
# Create Ollama model (requires Modelfile from the main repo)
ollama create raymond -f Modelfile
ollama run raymond
Model Details
| Item | Value |
|---|---|
| Base Model | Qwen3-4B-Instruct-2507 |
| Fine-tuning | LoRA (rank=64, alpha=128) |
| Training Data | 1495 samples (Claude Sonnet distilled) |
| Quantization | Q4_K_M |
| Size | 2.33 GB |
| Language | Chinese (primary), English |
See the main project for full pipeline details.
- Downloads last month
- 4
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for RuimengLiu/raymond-gguf
Base model
Qwen/Qwen3-4B-Instruct-2507