Llamacpp imatrix Quantizations of Qwen3-4B by Qwen with added metadata

Original model: https://huggingface.co/Qwen/Qwen3-4B

All quants are originally from https://huggingface.co/bartowski/Qwen_Qwen3-4B-GGUF

If you want a quick start simply click here https://huggingface.co/NobodyWho/Qwen_Qwen-4B-GGUF/resolve/main/Qwen_Qwen3-4B-Q4_K_M.gguf to download a model which has a good split between size and performance.

In these gguf files we have added some additional metadata. For the Qwen3 model family specifically we have added metadata for the begin and end thinking tags and the recommended sampler configuration. For the sampler configuration the format is a json array of the different sampling steps like temperature or top_k, which must always end with a step like dist or greedy that actually samples a token.

Prompt format

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Downloading using huggingface-cli

Click to view download instructions

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download NobodyWho/Qwen_Qwen3-4B-GGUF --include "Qwen_Qwen3-4B-Q4_K_M.gguf" --local-dir ./
huggingface-cli download NobodyWho/Qwen_Qwen3-4B-GGUF --include "Qwen_Qwen3-4B-Q8_0/*" --local-dir ./

You can either specify a new local-dir (Qwen_Qwen3-4B-Q8_0) or download them all in place (./)

Which file should I choose?

Coming soon!

Credits

Thank you to bartowski for providing the models

Downloads last month
1,662
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/Qwen_Qwen3-4B-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(202)
this model