--- language: en license: apache-2.0 tags: - tool-routing - function-calling - prism-aac - qwen3 - gguf base_model: Qwen/Qwen3-8B --- # prism-coder:8b — Tool Routing Model (iOS Tier) Fine-tuned Qwen3-8B for 6-tool routing in the Prism AAC system. Primary deployment: **iOS/edge** via llama.cpp GGUF. ## Versions | Version | File | BFCL | Notes | |---------|------|------|-------| | v31 | `qwen3-8b-v31-q4km.gguf` | 95.1% | Surgical smem/know boundary + save fixes | | v30 | `qwen3-8b-v30-q4km.gguf` | 95.0% | Routing corpus v36_1b7 | ## BFCL Routing Benchmark (v31) - **95.1%** — 3-seed mean (seeds 2027/2028/2029), 100 cases each - Eval: MLX inference, greedy (temp=0), Qwen3 thinking suppressed - Gate: ≥90% = deploy ## Tools 1. `session_load_context` — load/fetch/resume project context 2. `session_save_ledger` — note/log/remember/record 3. `session_save_handoff` — handoff/relay/next-agent transition 4. `session_compact_ledger` — compact/archive ledger 5. `session_search_memory` — recall past sessions/conversations 6. `knowledge_search` — search stored notes/knowledge base ## Cascade Role iOS fallback tier. Desktop cascade uses 14B → 32B → cloud Claude. 8B handles edge/offline scenarios where RAM < 6GB. ## Usage (Ollama) ```bash ollama pull dcostenco/prism-coder:8b-v30 ollama run dcostenco/prism-coder:8b-v30 ``` ## Training - Base: `Qwen3-8B` (MLX 4-bit) - Framework: MLX-LM LoRA (8 layers, batch 2, grad-checkpoint) - v31 data: 361 train / 41 valid (targeted smem/know boundary augmentations) - v31 LR: 3e-6 (surgical, 200 iters) - Peak memory: 7.0 GB