---
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-coder
  - qwen3.5
  - synalux
  - prompt-engineering
  - gguf
base_model: Qwen/Qwen3.5-4B
pipeline_tag: text-generation
---

# prism-coder:4b — Prism Memory Tool Router

Prompt-engineered [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for MCP tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system. No fine-tuning — the system prompt IS the specialization.

## Downloads

| File | Quantization | Size | BFCL Accuracy | Use when |
|------|-------------|------|---------------|----------|
| `Qwen3.5-4B-Q3_K_M.gguf` | Q3_K_M | **2.3 GB** | **99.1%** × 3 seeds | **iPhone / mobile first gate** |
| *(stock via Ollama)* | Q4_K_M | 3.4 GB | **100%** × 3 seeds | Mac / 8 GB+ devices |

## Quick Start

```bash
# iPhone-optimized (2.3 GB, 99.1%)
ollama pull dcostenco/prism-coder:2b

# Full quality (3.4 GB, 100%)
ollama pull dcostenco/prism-coder:4b
```

## BFCL Benchmark

### Q3_K_M (prism-coder:2b) — 99.1% × 3 seeds

**114/115 × 3 shuffled runs = 99.1%, 1 flaky case**

| Category | Count | Accuracy |
|----------|------:|:--------:|
| save | 17 | 100% |
| smem | 17 | 100% |
| aac | 12 | 100% |
| hand | 12 | 100% |
| irrel | 10 | 90% |
| load | 9 | 100% |
| pred | 8 | 100% |
| know | 7 | 100% |
| cmpct | 6 | 100% |
| edge | 6 | 100% |
| tran | 6 | 100% |
| info | 5 | 100% |

Single failure: "Write a regex to match email addresses" → knowledge_search instead of plain.

### Q4_K_M (prism-coder:4b) — 100% × 3 seeds

**115/115 × 3 shuffled runs = 100.0%, 0 flaky**

## Architecture

Qwen3.5-4B uses a hybrid attention architecture:
- **24 linear attention layers** (Gated DeltaNet) — O(n) inference
- **8 full attention layers** (standard softmax) — precise retrieval

This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching.

## Fleet Position

| Model | Ollama tag | Size | BFCL | Role |
|---|---|---|---|---|
| **Qwen3.5-4B Q3_K_M** | **`dcostenco/prism-coder:2b`** | **2.3 GB** | **99.1%** | **iPhone / mobile** |
| Qwen3.5-4B Q4_K_M | `dcostenco/prism-coder:4b` | 3.4 GB | 100% | Verifier / 8 GB+ |
| Qwen3.5-9B Q4_K_M | `dcostenco/prism-coder:9b` | 5.8 GB | 100% | Default router |
| prism-coder:32b | `dcostenco/prism-coder:32b` | 19 GB | 100% | Complex tasks |

## Links

- [Ollama model page](https://ollama.com/dcostenco/prism-coder) — pull and run
- [Prism MCP Server](https://github.com/dcostenco/prism-coder) — the MCP server
- [Qwen3.5-4B base](https://huggingface.co/Qwen/Qwen3.5-4B) — upstream model