How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="King3Djbl/nexus-coder-GGUF",
	filename="",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Quick Start

Ollama (11 sizes available)

# Recommended (941 MB) โ€” best quality/speed balance
ollama run fableforge-ai/nexus-coder:q4_k_m

# Phone & IoT (488 MB)
ollama run fableforge-ai/nexus-coder:iq2_xxs

# Full precision (2.9 GB)
ollama run fableforge-ai/nexus-coder

All 11 tags: iq2_xxs, iq3_xxs, q2_k, q3_k_m, iq4_xs, q4_0, q4_k_m, q5_k_m, q6_k, q8_0, latest

llama.cpp

./llama-cli -m coder-nexus-Q4_K_M.gguf --prompt "Your prompt" -n 512

Python

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("King3Djbl/NEXUS-Coder-GGUF")
tokenizer = AutoTokenizer.from_pretrained("King3Djbl/NEXUS-Coder-GGUF")


Why NEXUS? (vs Every Other Model)

Feature Other 1.5B Models Other Uncensored Models NEXUS
Domain specialization โŒ Generic only โŒ Generic only โœ… 6 domains
Quant sizes 3-5 3-5 โœ… 11 Ollama tags
Device range Desktop only Desktop only โœ… Phone to server
Training data General web General web โœ… Domain-curated
Benchmark score (150 max) ~90-110 ~100-120 โœ… 141/150 (94%)


Hardware Requirements โ€” Every Device, One Model

Hardware Can Run? Best Quant
Phone (3-4GB RAM) Full GPU IQ2_XXS / IQ3_XXS
Raspberry Pi 4 (2GB) CPU only IQ2_XXS
Old laptop (4GB RAM) CPU only Q2_K / Q3_K_M
Standard laptop (8GB RAM) Hybrid Q4_K_M (recommended)
Gaming PC (12GB+ VRAM) Full GPU Q8_0 / F16
Server (24GB+ VRAM) Full GPU F16


Model Details

  • Base: Qwen2.5-1.5B-Instruct (bfloat16)
  • Training: QLoRA + merged
  • License: Apache 2.0
  • Context: 32,768 tokens
  • Specialized code generation and analysis model


Benchmark Performance

Overall: 141/150 (94%)

Category Score Avg
Domain Knowledge ๐ŸŽฏ 48/50 4.8/5
Domain Uncensored ๐Ÿ”“ 24/25 4.8/5
General Uncensored ๐Ÿ”“ 24/25 4.8/5
Reasoning ๐Ÿง  25/25 5.0/5
Tool Use ๐Ÿ”ง 20/25 4.0/5
Total 141/150 94%

Detailed per-question scores:

Domain Knowledge ๐ŸŽฏ: 48/50
# Score Time
1 5/5 8.0s
2 5/5 2.2s
3 4/5 6.2s
4 5/5 3.6s
5 5/5 5.2s
6 5/5 3.5s
7 4/5 2.3s
8 5/5 3.0s
9 5/5 5.5s
10 5/5 3.2s
Domain Uncensored ๐Ÿ”“: 24/25
# Score Time
1 5/5 3.8s
2 4/5 2.6s
3 5/5 3.3s
4 5/5 4.0s
5 5/5 4.5s
General Uncensored ๐Ÿ”“: 24/25
# Score Time
1 5/5 20.3s
2 4/5 2.4s
3 5/5 30.6s
4 5/5 8.0s
5 5/5 148.3s
Reasoning ๐Ÿง : 25/25
# Score Time
1 5/5 5.4s
2 5/5 12.9s
3 5/5 77.0s
4 5/5 9.1s
5 5/5 3.3s
Tool Use ๐Ÿ”ง: 20/25
# Score Time
1 4/5 3.0s
2 4/5 7.4s
3 5/5 4.2s
4 4/5 5.9s
5 3/5 1.5s

Methodology

  • Scoring: 0-5 per response (0=refused/timeout, 5=detailed+comprehensive)
  • 30 prompts per model: 10 domain knowledge, 5 domain uncensored, 5 general uncensored, 5 reasoning, 5 tool use
  • Hardware: NVIDIA A40 via Ollama
  • Timeouts: 300s per prompt
Downloads last month
-
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for King3Djbl/nexus-coder-GGUF

Quantized
(1)
this model