How to use from
Docker Model Runner
docker model run hf.co/dcostenco/prism-coder-1.7b
Quick Links

prism-coder:1.7b โ€” Tool Routing Model (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for 6-tool routing in the Prism AAC system. Primary deployment: any iOS device via llama.cpp GGUF โ€” the guaranteed fallback for all device tiers.

BFCL Routing Benchmark โ€” v42 (Current)

Mean: 100.0% (3-seed average, seeds 2027/2028/2029, 102 cases each)

Category Count Description Accuracy
aac 12 AAC phrase requests โ†’ plain text 100%
cmpct 6 Ledger compaction 100%
edge 6 Multi-step / compound requests 100%
hand 8 Agent handoff / relay 100%
info 5 General facts โ†’ plain text 100%
irrel 10 Irrelevant / live queries โ†’ plain text 100%
know 7 Knowledge base search 100%
load 9 Session context loading 100%
pred 8 Factual / knowledge queries โ†’ plain text 100%
save 13 Session ledger save 100%
smem 12 Session memory search 100%
tran 6 Translation requests โ†’ plain text 100%

Eval: MLX inference + thinking, temperature=0, 3-seed mean. Gate: โ‰ฅ90% = deploy.

Version History

Version BFCL Notes
v42 100.0% Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load
v41 96.1% Proper safetensors merge โ€” fixes mlx_lm.fuse LoRA loss
v36 94.1% LoRA rank=16, all 28 layers, mask-prompt
v19 ~88% Baseline 1.7B routing

Tools

The model routes to exactly 6 tools:

Tool Trigger
session_load_context Load/resume/pull project context
session_save_ledger Note/log/record/remember something
session_save_handoff Pass state to next agent/session
session_compact_ledger Compact/shrink/prune ledger
session_search_memory Recall prior session discussions
knowledge_search Search stored knowledge base ("what do I know")

Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.

Model Details

  • Base: Qwen/Qwen3-1.7B
  • Format: GGUF Q4_K_M (~1.2 GB)
  • Context: 32,768 tokens
  • Training: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
  • Merge: direct safetensors merge (scale/rank ร— B.T @ A.T) โ†’ llama.cpp convert โ†’ Q4_K_M quantization

Usage

ollama pull dcostenco/prism-coder:1b7
ollama run prism-coder:1b7

Or in Prism AAC โ€” the app downloads and loads this model automatically on devices with <8 GB RAM.

Downloads last month
1,330
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dcostenco/prism-coder-1.7b

Finetuned
Qwen/Qwen3-1.7B
Quantized
(268)
this model