Qwen3-4B — GGUF K-Quant refined with GSQ

GGUF K-Quant checkpoints of Qwen/Qwen3-4B in which the discrete grid assignments have been refined with GSQ (Gumbel-Softmax Quantization), starting from a public GGUF initialization and projected back into the same K-Quant format. The optimized files run unchanged on llama.cpp / Ollama.

Quantization details

  • Base model: Qwen/Qwen3-4B
  • Format: GGUF K-Quant (drop-in replacement for standard K-Quant files)
  • Pipeline: GGUF K-Quant init → GSQ Gumbel-Softmax refinement → re-pack into K-Quant
  • Runtime: llama.cpp, ollama, LM Studio, anything that consumes GGUF

Storage layout

These files are bit-for-bit standard GGUF K-Quant. GSQ only changes the values of the quantized weights (it relearns the discrete grid assignments inside each K-Quant block), not the block structure, scales, or super-block layout. As a result:

  • The file size matches the corresponding upstream K-Quant for the same quant tier.
  • Any llama.cpp / ollama build that loads regular Qwen3-4B-Q2_K.gguf loads this file with zero changes.
  • The Hugging Face UI reports GGUF block types (e.g. Q2_K, Q4_K, Q6_K) rather than per-tensor dtypes — those refer to the on-disk K-Quant encoding, not the precision of any optimizer state.

Usage with llama.cpp

huggingface-cli download ISTA-DASLab/Qwen3-4B-GGUF-GSQ \
  Qwen3-4B-Q2_K.gguf --local-dir .

./llama-cli -m Qwen3-4B-Q2_K.gguf -p "Hello"

Usage with Ollama

ollama run hf.co/ISTA-DASLab/Qwen3-4B-GGUF-GSQ:Q2_K

Citation

@article{gsq2026,
  title  = {GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling},
  author = {Dadgarnia, Alireza and Tabesh, Soroush and Nikdan, Mahdi and Helcig, Michael and Kurti{\'c}, Eldar and Kleinegger, Max and Alistarh, Dan},
  journal= {arXiv preprint arXiv:2604.18556},
  year   = {2026},
  url    = {https://arxiv.org/abs/2604.18556}
}
Downloads last month
99
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ISTA-DASLab/Qwen3-4B-GGUF-GSQ

Finetuned
Qwen/Qwen3-4B
Quantized
(218)
this model

Collection including ISTA-DASLab/Qwen3-4B-GGUF-GSQ

Paper for ISTA-DASLab/Qwen3-4B-GGUF-GSQ