OmniCoder

OmniCoder-9B-GGUF

GGUF quantizations of OmniCoder-9B

License Full Weights


Available Quantizations

Quantization Size Use Case
Q2_K ~3.8 GB Extreme compression, lowest quality
Q3_K_S ~4.3 GB Small footprint
Q3_K_M ~4.6 GB Small footprint, balanced
Q3_K_L ~4.9 GB Small footprint, higher quality
Q4_0 ~5.3 GB Good balance
Q4_K_S ~5.4 GB Good balance
Q4_K_M ~5.7 GB Recommended for most users
Q5_0 ~6.3 GB High quality
Q5_K_S ~6.3 GB High quality
Q5_K_M ~6.5 GB High quality, balanced
Q6_K ~7.4 GB Near-lossless
Q8_0 ~9.5 GB Highest quality quantization
BF16 ~17.9 GB Full precision

Usage

# Install llama.cpp
brew install llama.cpp  # macOS
# or build from source: https://github.com/ggml-org/llama.cpp

# Interactive chat
llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192

# Server mode (OpenAI-compatible API)
llama-server --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -c 8192

Built by Tesslate | See full model card: OmniCoder-9B

Downloads last month
-
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tesslate/OmniCoder-9B-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(1)
this model

Collection including Tesslate/OmniCoder-9B-GGUF