How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-14b",
	filename="prism-aac-14b-q4km.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

prism-coder:14b (v26-polish) — 98% routing, ties Claude Opus 4.7

LoRA fine-tune of Qwen3-14B for offline MCP tool routing. Ties Claude Opus 4.7 at 98.0% ± 0.0% on the 100-case Prism eval (3-seed verified, zero variance). 3x faster (1.1s vs 3.0s), fully offline, zero cost per request.

Routing accuracy — 100-case Prism eval (May 15 2026, 3-seed mean)

Model Accuracy Cost/req Latency
Claude Sonnet 4 99% ~$0.01 3.2s
prism-coder:14b 98.0% ± 0.0% $0 1.1s
Claude Opus 4.7 98% ~$0.05 3.0s

Per-category (3-seed mean, zero variance):

Category Score
Overall 98.0%
session_load_context 100%
session_save_ledger 100%
session_search_memory 100%
session_save_handoff 87%
session_compact_ledger 100%
brave_web_search 100%
knowledge_search 100%
AAC plain-text 100%
translate plain-text 100%
plain text 100%
no-tool refusal 100%
info / lookup 100%
edge (multi-step) 82%
avg latency 1.1s
invented tools 0

How it got to 98% — prompt engineering, zero retraining

The 14B went from 87% to 98% with zero retraining, zero GPU cost — purely system prompt changes:

  1. v26 (+4 pts): -> plain text changed to -> respond directly (no tool). Q4_K_M models misread "plain text" as a tool name.
  2. v27 (+7 pts): Labeled category headers added to routing rules:
    CONVERSATION RECALL: what did we discuss / previously talked about -> session_search_memory
    SAVED KNOWLEDGE: what do I know / stored notes / on file about -> knowledge_search
    
    Labels act as semantic anchors stronger than keyword matching at Q4_K_M precision. knowledge_search jumped from 43% to 100%.

Offline cascade architecture

In the Prism AAC app, the 14B is the primary offline router:

Synalux cloud (Claude, 99%) → prism-coder:14b (98%, 1.1s) → prism-coder:1.7b (88%, iPhone fallback)

Training recipe (v26-polish)

  • Base: Qwen/Qwen3-14B (bf16)
  • LoRA: r=8, alpha=16, dropout 0.05, QKVO only
  • Corpus: 576 rows, 56% plain-text + 44% tool
  • Schedule: 50 iters, LR 1e-6, Mac M4 Max (MLX-LM), ~5 min
  • Note: the 87% to 98% improvement is from prompt engineering (v25→v27), not weight changes

Usage

ollama pull dcostenco/prism-coder:14b

Use the v27 system prompt with the nothink template. The 98% score requires both.

Hardware

  • Mac: M2 Pro+ / 24GB+ unified memory
  • Linux: RTX 3090/4090 (24GB)
  • VRAM: ~10 GB loaded

All Prism Coder models

Model Accuracy Size Device HuggingFace
prism-coder:14b 98% 8.4 GB Mac / iPad Pro 16GB dcostenco/prism-coder-14b
prism-coder:8b 96% 4.7 GB iPhone / iPad 8GB dcostenco/prism-coder-8b
prism-coder:32b 97.3% 19 GB Mac M2 Ultra+ dcostenco/prism-coder-32b
prism-coder:1.7b 88% 2.2 GB Any device / iPhone dcostenco/prism-coder-1.7b

GitHub: dcostenco/prism-coder · AAC app: dcostenco/prism-aac · Portal: synalux.ai

Get the full stack

The model routes tool calls — but needs a backend to route TO:

# Install the memory server (free, local, no API keys)
npm install -g prism-mcp-server

# Pull the model
ollama pull dcostenco/prism-coder:14b

# Done — your AI agent now has persistent memory + 98% tool routing

Free tier: local SQLite, no cloud, no account needed. Synalux portal: cloud sync, HIPAA dashboard, team access, Claude fallback → synalux.ai


Prism Routing Benchmark

This model is evaluated on the Prism Routing Benchmark — a 100-case, 13-category eval for MCP tool routing. Run it yourself:

git clone https://github.com/dcostenco/prism-coder
cd prism-coder
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 14b --seed 2027

Not a general function-calling benchmark (BFCL). This measures routing precision on 7 specific MCP tools — the task these models were built for. The value is offline reliability at zero cost, not competing with frontier models on arbitrary APIs.

License

Apache-2.0.

Downloads last month
156
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dcostenco/prism-coder-14b

Finetuned
Qwen/Qwen3-14B
Adapter
(210)
this model
Adapters
2 models