Mixed quantization with BF16 attention weights (Q/K/V, QKV, output) and embeddings for maximum reasoning fidelity, while applying Q4_K_M to FFN and SSM layers for efficient compression — 24GB at 7.60 BPW.

for 5090 32g vram is nice

Downloads last month: 713

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support