prism-coder-1.7b / README.md
dcostenco's picture
Upload README.md with huggingface_hub
8e12d5c verified
metadata
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-aac
  - qwen3
  - gguf
base_model: Qwen/Qwen3-1.7B

prism-coder:1.7b β€” Tool Routing Model (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for 6-tool routing in the Prism AAC system. Primary deployment: any iOS device via llama.cpp GGUF β€” the guaranteed fallback for all device tiers.

BFCL Routing Benchmark β€” v42 (Current)

Mean: 100.0% (3-seed average, seeds 2027/2028/2029, 102 cases each)

Category Count Description Accuracy
aac 12 AAC phrase requests β†’ plain text 100%
cmpct 6 Ledger compaction 100%
edge 6 Multi-step / compound requests 100%
hand 8 Agent handoff / relay 100%
info 5 General facts β†’ plain text 100%
irrel 10 Irrelevant / live queries β†’ plain text 100%
know 7 Knowledge base search 100%
load 9 Session context loading 100%
pred 8 Factual / knowledge queries β†’ plain text 100%
save 13 Session ledger save 100%
smem 12 Session memory search 100%
tran 6 Translation requests β†’ plain text 100%

Eval: MLX inference + thinking, temperature=0, 3-seed mean. Gate: β‰₯90% = deploy.

Version History

Version BFCL Notes
v42 100.0% Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load
v41 96.1% Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss
v36 94.1% LoRA rank=16, all 28 layers, mask-prompt
v19 ~88% Baseline 1.7B routing

Tools

The model routes to exactly 6 tools:

Tool Trigger
session_load_context Load/resume/pull project context
session_save_ledger Note/log/record/remember something
session_save_handoff Pass state to next agent/session
session_compact_ledger Compact/shrink/prune ledger
session_search_memory Recall prior session discussions
knowledge_search Search stored knowledge base ("what do I know")

Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.

Model Details

  • Base: Qwen/Qwen3-1.7B
  • Format: GGUF Q4_K_M (~1.2 GB)
  • Context: 32,768 tokens
  • Training: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
  • Merge: direct safetensors merge (scale/rank Γ— B.T @ A.T) β†’ llama.cpp convert β†’ Q4_K_M quantization

Usage

ollama pull dcostenco/prism-coder:1b7
ollama run prism-coder:1b7

Or in Prism AAC β€” the app downloads and loads this model automatically on devices with <8 GB RAM.