How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Nodmix/Nodmix-Q3",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Nodmix-Q4

Nodmix is the latest generation of large language models in Nodmix IQ series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Nodmix delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Model Files

File Name Size Quantization Format Description
Nodmix_Q4.F32.gguf 16.1 GB FP32 GGUF Full precision (float32) version
Nodmix_4B.BF16.gguf 8.05 GB BF16 GGUF BFloat16 precision version
Nodmix_4B.F16.gguf 8.05 GB FP16 GGUF Float16 precision version
Nodmix_4B.Q3_K_M.gguf 2.08 GB Q3_K_M GGUF 3-bit quantized (K M variant)
Nodmix_4B.Q3_K_S.gguf 1.89 GB Q3_K_S GGUF 3-bit quantized (K S variant)
Nodmix_4B.Q4_K_M.gguf 2.5 GB Q4_K_M GGUF 4-bit quantized (K M variant)
Nodmix_4B.Q4_K_S.gguf 2.38 GB Q4_K_S GGUF 4-bit quantized (K S variant)
Nodmix_4B.Q5_K_M.gguf 2.89 GB Q5_K_M GGUF 5-bit quantized (K M variant)
Nodmix_4B.Q8_0.gguf 4.28 GB Q8_0 GGUF 8-bit quantized
.gitattributes 2.02 kB Git LFS tracking file
config.json 31 B Configuration placeholder
README.md 3.6 kB Model documentation

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

| Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

Downloads last month
10
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support