base for:https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled tool:llamacpp

Mixed quantization with BF16 attention weights (Q/K/V, QKV, output) and embeddings for maximum reasoning fidelity, while applying Q4_K_M to FFN and SSM layers for efficient compression โ€” 24GB at 7.60 BPW.

for 5090 32g vram is nice

Downloads last month
713
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support