Qwen3.5-0.8B GGUF Quantizations

Converted from: Qwen/Qwen3.5-0.8B-Base

Quantizations

  • Q2_K โ†’ smallest, fastest
  • Q3_K_M โ†’ balanced
  • Q4_K_M โ†’ recommended
  • Q5_K_M โ†’ highest quality

Recommended: Q4_K_M (best balance of speed and quality)

Q2_K โ†’ ~200MB

Q4_K_M โ†’ ~500MB

Q5_K_M โ†’ ~650MB

Tested on

  • LM Studio โœ”
  • llama.cpp โœ”

Notes

  • Converted using llama.cpp
  • No LoRA / base model only

license: apache 2.0

Downloads last month
75
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support