--- language: en license: apache-2.0 tags: - tool-routing - function-calling - prism-coder - qwen3.5 - synalux - prompt-engineering - gguf base_model: Qwen/Qwen3.5-4B pipeline_tag: text-generation --- # prism-coder:4b — Prism Memory Tool Router Prompt-engineered [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for MCP tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system. No fine-tuning — the system prompt IS the specialization. ## Downloads | File | Quantization | Size | BFCL Accuracy | Use when | |------|-------------|------|---------------|----------| | `Qwen3.5-4B-Q3_K_M.gguf` | Q3_K_M | **2.3 GB** | **99.1%** × 3 seeds | **iPhone / mobile first gate** | | *(stock via Ollama)* | Q4_K_M | 3.4 GB | **100%** × 3 seeds | Mac / 8 GB+ devices | ## Quick Start ```bash # iPhone-optimized (2.3 GB, 99.1%) ollama pull dcostenco/prism-coder:2b # Full quality (3.4 GB, 100%) ollama pull dcostenco/prism-coder:4b ``` ## BFCL Benchmark ### Q3_K_M (prism-coder:2b) — 99.1% × 3 seeds **114/115 × 3 shuffled runs = 99.1%, 1 flaky case** | Category | Count | Accuracy | |----------|------:|:--------:| | save | 17 | 100% | | smem | 17 | 100% | | aac | 12 | 100% | | hand | 12 | 100% | | irrel | 10 | 90% | | load | 9 | 100% | | pred | 8 | 100% | | know | 7 | 100% | | cmpct | 6 | 100% | | edge | 6 | 100% | | tran | 6 | 100% | | info | 5 | 100% | Single failure: "Write a regex to match email addresses" → knowledge_search instead of plain. ### Q4_K_M (prism-coder:4b) — 100% × 3 seeds **115/115 × 3 shuffled runs = 100.0%, 0 flaky** ## Architecture Qwen3.5-4B uses a hybrid attention architecture: - **24 linear attention layers** (Gated DeltaNet) — O(n) inference - **8 full attention layers** (standard softmax) — precise retrieval This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching. ## Fleet Position | Model | Ollama tag | Size | BFCL | Role | |---|---|---|---|---| | **Qwen3.5-4B Q3_K_M** | **`dcostenco/prism-coder:2b`** | **2.3 GB** | **99.1%** | **iPhone / mobile** | | Qwen3.5-4B Q4_K_M | `dcostenco/prism-coder:4b` | 3.4 GB | 100% | Verifier / 8 GB+ | | Qwen3.5-9B Q4_K_M | `dcostenco/prism-coder:9b` | 5.8 GB | 100% | Default router | | prism-coder:32b | `dcostenco/prism-coder:32b` | 19 GB | 100% | Complex tasks | ## Links - [Ollama model page](https://ollama.com/dcostenco/prism-coder) — pull and run - [Prism MCP Server](https://github.com/dcostenco/prism-coder) — the MCP server - [Qwen3.5-4B base](https://huggingface.co/Qwen/Qwen3.5-4B) — upstream model