Qwen3-8B-Instruct — GGUF Quants

Quantized GGUF versions of Qwen/Qwen3-8B-Instruct — Alibaba's Qwen3 instruction-tuned 8B model. Fine-tuned for chat and instruction following with improved reasoning and tool use over Qwen2.5-7B-Instruct.

Available Files

File	Quant	Size	Use Case
`Qwen3-8B-Instruct-Q8_0.gguf`	Q8_0	~8.6GB	Maximum quality
`Qwen3-8B-Instruct-Q6_K.gguf`	Q6_K	~6.6GB	Near-lossless
`Qwen3-8B-Instruct-Q5_K_M.gguf`	Q5_K_M	~5.7GB	High quality
`Qwen3-8B-Instruct-Q4_K_M.gguf`	Q4_K_M	~4.9GB	Recommended default
`Qwen3-8B-Instruct-Q3_K_M.gguf`	Q3_K_M	~3.9GB	Low VRAM
`Qwen3-8B-Instruct-IQ4_XS.gguf`	IQ4_XS	~4.3GB	Imatrix 4-bit
`Qwen3-8B-Instruct-IQ3_XXS.gguf`	IQ3_XXS	~3.2GB	Imatrix 3-bit
`Qwen3-8B-Instruct-IQ2_M.gguf`	IQ2_M	~2.8GB	Imatrix 2-bit
`Qwen3-8B-Instruct-IQ1_S.gguf`	IQ1_S	~2.0GB	Extreme compression
`Qwen3-8B-Instruct-fp16.gguf`	FP16	~16.0GB	Full precision
`imatrix.dat`	—	—	Importance matrix

Usage

./llama-cli -m Qwen3-8B-Instruct-Q4_K_M.gguf --ctx-size 8192 -n 512 \
  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\n"

# Ollama
ollama run hf.co/DuoNeural/Qwen3-8B-Instruct-GGUF:Q4_K_M

Parameters: 8B | License: Apache 2.0 | Context: 32K tokens

Quantized by DuoNeural using llama.cpp on RTX 5090.

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

Platform	Link
HuggingFace	huggingface.co/DuoNeural
Website	duoneural.com
GitHub	github.com/DuoNeural
X / Twitter	@DuoNeural
Email	duoneural@proton.me
Newsletter	duoneural.beehiiv.com
Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804
The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems	10.5281/zenodo.19952612

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Downloads last month: -; Downloads are not tracked for this model. How to track