Raymond GGUF

Qwen3-4B-Instruct-2507 fine-tuned with LoRA on synthetic chat data distilled via Claude Sonnet. Quantized to Q4_K_M for local inference via Ollama.

Usage

# Download
huggingface-cli download RuimengLiu/raymond-gguf raymond-q4_k_m.gguf --local-dir .

# Create Ollama model (requires Modelfile from the main repo)
ollama create raymond -f Modelfile
ollama run raymond

Model Details

Item Value
Base Model Qwen3-4B-Instruct-2507
Fine-tuning LoRA (rank=64, alpha=128)
Training Data 1495 samples (Claude Sonnet distilled)
Quantization Q4_K_M
Size 2.33 GB
Language Chinese (primary), English

See the main project for full pipeline details.

Downloads last month
4
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RuimengLiu/raymond-gguf

Adapter
(5224)
this model