--- language: en license: apache-2.0 tags: - tool-routing - function-calling - prism-aac - qwen3 - gguf base_model: Qwen/Qwen3-1.7B --- # prism-coder:1.7b — Tool Routing Model (Always-Fits Tier) Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system. Primary deployment: **any iOS device** via llama.cpp GGUF — the guaranteed fallback for all device tiers. ## BFCL Routing Benchmark — v42 (Current) **Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each) | Category | Count | Description | Accuracy | |----------|------:|-------------|:--------:| | aac | 12 | AAC phrase requests → plain text | 100% | | cmpct | 6 | Ledger compaction | 100% | | edge | 6 | Multi-step / compound requests | 100% | | hand | 8 | Agent handoff / relay | 100% | | info | 5 | General facts → plain text | 100% | | irrel | 10 | Irrelevant / live queries → plain text | 100% | | know | 7 | Knowledge base search | 100% | | load | 9 | Session context loading | 100% | | pred | 8 | Factual / knowledge queries → plain text | 100% | | save | 13 | Session ledger save | 100% | | smem | 12 | Session memory search | 100% | | tran | 6 | Translation requests → plain text | 100% | Eval: MLX inference + thinking, temperature=0, 3-seed mean. Gate: ≥90% = deploy. ## Version History | Version | BFCL | Notes | |---------|------|-------| | v42 | **100.0%** | Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load | | v41 | 96.1% | Proper safetensors merge — fixes mlx_lm.fuse LoRA loss | | v36 | 94.1% | LoRA rank=16, all 28 layers, mask-prompt | | v19 | ~88% | Baseline 1.7B routing | ## Tools The model routes to exactly 6 tools: | Tool | Trigger | |------|---------| | `session_load_context` | Load/resume/pull project context | | `session_save_ledger` | Note/log/record/remember something | | `session_save_handoff` | Pass state to next agent/session | | `session_compact_ledger` | Compact/shrink/prune ledger | | `session_search_memory` | Recall prior session discussions | | `knowledge_search` | Search stored knowledge base ("what do I know") | Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math. ## Model Details - **Base**: Qwen/Qwen3-1.7B - **Format**: GGUF Q4_K_M (~1.2 GB) - **Context**: 32,768 tokens - **Training**: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid) - **Merge**: direct safetensors merge (scale/rank × B.T @ A.T) → llama.cpp convert → Q4_K_M quantization ## Usage ```bash ollama pull dcostenco/prism-coder:1b7 ollama run prism-coder:1b7 ``` Or in [Prism AAC](https://github.com/dcostenco/prism-aac) — the app downloads and loads this model automatically on devices with <8 GB RAM.