Llamacpp imatrix Quantizations of Qwen3-4B by Qwen with added metadata
Original model: https://huggingface.co/Qwen/Qwen3-4B
All quants are originally from https://huggingface.co/bartowski/Qwen_Qwen3-4B-GGUF
If you want a quick start simply click here https://huggingface.co/NobodyWho/Qwen_Qwen-4B-GGUF/resolve/main/Qwen_Qwen3-4B-Q4_K_M.gguf to download a model which has a good split between size and performance.
In these gguf files we have added some additional metadata. For the Qwen3 model family specifically we have added metadata for the begin and end thinking tags and the recommended sampler configuration.
For the sampler configuration the format is a json array of the different sampling steps like temperature or top_k, which must always end with a step like dist or greedy that actually samples a token.
Prompt format
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download NobodyWho/Qwen_Qwen3-4B-GGUF --include "Qwen_Qwen3-4B-Q4_K_M.gguf" --local-dir ./
huggingface-cli download NobodyWho/Qwen_Qwen3-4B-GGUF --include "Qwen_Qwen3-4B-Q8_0/*" --local-dir ./
You can either specify a new local-dir (Qwen_Qwen3-4B-Q8_0) or download them all in place (./)
Which file should I choose?
Coming soon!
Credits
Thank you to bartowski for providing the models
- Downloads last month
- 1,662
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit