How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-1.7b",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

prism-coder:1b7 โ€” AAC Tool Router (1.7B)

Fine-tuned from Qwen3-1.7B for deterministic tool routing in the Prism AAC system.

BFCL accuracy: 100% on 100-case ร— 3 seeds routing benchmark (v36 corpus).

What it does

Routes user messages to one of 6 tools or plain text with zero hallucination:

Tool Trigger
session_load_context Load/fetch context for project X
session_save_ledger Note / jot down / log / remember
session_save_handoff Handoff to next agent / pass on
session_compact_ledger Compact/archive/trim the ledger
session_search_memory What did we discuss / recall session
knowledge_search What do I know / stored notes
(plain text) AAC phrases, math, facts, translation, time

Deployment

iOS / edge โ€” runs on-device via llama.cpp (1.0 GB, Q4_K_M):

ollama run dcostenco/prism-coder:1b7

Files

File Size Format
prism-coder-1b7-v36-q4km.gguf 1.0 GB Q4_K_M GGUF (recommended)
prism-aac-1b7-q4km.gguf 1.0 GB Q4_K_M GGUF (legacy name)

Training

  • Base: Qwen3-1.7B
  • Method: MLX LoRA fine-tuning (mlx_lm.lora)
  • Dataset: v36_1b7 routing corpus (414 examples, 6-tool system prompt)
  • Hardware: Apple Silicon (M-series), ~4GB RAM
  • Eval: BFCL 100-case benchmark ร— 3 seeds โ†’ 100%

System prompt

Uses the 13-rule routing system prompt. See Prism AAC for the canonical prompt used in training and inference.

Downloads last month
896
MLX
Hardware compatibility
Log In to add your hardware

Quantized

GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dcostenco/prism-coder-1.7b

Finetuned
Qwen/Qwen3-1.7B
Quantized
(267)
this model