Ministral-8B-Instruct โ€” GGUF Quants

Quantized GGUF versions of mistralai/Ministral-8B-Instruct-2410 โ€” Mistral AI's Ministral 8B instruct model, optimized for edge and on-device deployment. Features sliding window attention for efficient long-context processing.

Available Files

File Quant Size Use Case
Ministral-8B-Instruct-Q8_0.gguf Q8_0 ~8.5GB Maximum quality
Ministral-8B-Instruct-Q6_K.gguf Q6_K ~6.6GB Near-lossless
Ministral-8B-Instruct-Q5_K_M.gguf Q5_K_M ~5.7GB High quality
Ministral-8B-Instruct-Q4_K_M.gguf Q4_K_M ~4.9GB Recommended default
Ministral-8B-Instruct-Q3_K_M.gguf Q3_K_M ~3.9GB Low VRAM
Ministral-8B-Instruct-IQ4_XS.gguf IQ4_XS ~4.3GB Imatrix 4-bit
Ministral-8B-Instruct-IQ3_XXS.gguf IQ3_XXS ~3.2GB Imatrix 3-bit
Ministral-8B-Instruct-IQ2_M.gguf IQ2_M ~2.8GB Imatrix 2-bit
Ministral-8B-Instruct-IQ1_S.gguf IQ1_S ~2.0GB Extreme compression
Ministral-8B-Instruct-fp16.gguf FP16 ~16.0GB Full precision
imatrix.dat โ€” โ€” Importance matrix

Usage

./llama-cli -m Ministral-8B-Instruct-Q4_K_M.gguf \
  --ctx-size 8192 -n 512 \
  -p "[INST] Hello! [/INST]"

ollama run hf.co/DuoNeural/Ministral-8B-Instruct-GGUF:Q4_K_M
  • Parameters: 8B | License: Apache 2.0 | Context: 32K (SWA)

Quantized by DuoNeural using llama.cpp on RTX 5090.


DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Downloads last month
1,465
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/Ministral-8B-Instruct-GGUF

Quantized
(69)
this model