prism-coder-2b / README.md
dcostenco's picture
Upload README.md with huggingface_hub
0d24c05 verified
|
Raw
History Blame Contribute Delete
2.72 kB
---
language: en
license: apache-2.0
tags:
- tool-routing
- function-calling
- prism-coder
- qwen3.5
- synalux
- prompt-engineering
- gguf
base_model: Qwen/Qwen3.5-4B
pipeline_tag: text-generation
---
# prism-coder:4b β€” Prism Memory Tool Router
Prompt-engineered [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for MCP tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system. No fine-tuning β€” the system prompt IS the specialization.
## Downloads
| File | Quantization | Size | BFCL Accuracy | Use when |
|------|-------------|------|---------------|----------|
| `Qwen3.5-4B-Q3_K_M.gguf` | Q3_K_M | **2.3 GB** | **99.1%** Γ— 3 seeds | **iPhone / mobile first gate** |
| *(stock via Ollama)* | Q4_K_M | 3.4 GB | **100%** Γ— 3 seeds | Mac / 8 GB+ devices |
## Quick Start
```bash
# iPhone-optimized (2.3 GB, 99.1%)
ollama pull dcostenco/prism-coder:2b
# Full quality (3.4 GB, 100%)
ollama pull dcostenco/prism-coder:4b
```
## BFCL Benchmark
### Q3_K_M (prism-coder:2b) β€” 99.1% Γ— 3 seeds
**114/115 Γ— 3 shuffled runs = 99.1%, 1 flaky case**
| Category | Count | Accuracy |
|----------|------:|:--------:|
| save | 17 | 100% |
| smem | 17 | 100% |
| aac | 12 | 100% |
| hand | 12 | 100% |
| irrel | 10 | 90% |
| load | 9 | 100% |
| pred | 8 | 100% |
| know | 7 | 100% |
| cmpct | 6 | 100% |
| edge | 6 | 100% |
| tran | 6 | 100% |
| info | 5 | 100% |
Single failure: "Write a regex to match email addresses" β†’ knowledge_search instead of plain.
### Q4_K_M (prism-coder:4b) β€” 100% Γ— 3 seeds
**115/115 Γ— 3 shuffled runs = 100.0%, 0 flaky**
## Architecture
Qwen3.5-4B uses a hybrid attention architecture:
- **24 linear attention layers** (Gated DeltaNet) β€” O(n) inference
- **8 full attention layers** (standard softmax) β€” precise retrieval
This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching.
## Fleet Position
| Model | Ollama tag | Size | BFCL | Role |
|---|---|---|---|---|
| **Qwen3.5-4B Q3_K_M** | **`dcostenco/prism-coder:2b`** | **2.3 GB** | **99.1%** | **iPhone / mobile** |
| Qwen3.5-4B Q4_K_M | `dcostenco/prism-coder:4b` | 3.4 GB | 100% | Verifier / 8 GB+ |
| Qwen3.5-9B Q4_K_M | `dcostenco/prism-coder:9b` | 5.8 GB | 100% | Default router |
| prism-coder:32b | `dcostenco/prism-coder:32b` | 19 GB | 100% | Complex tasks |
## Links
- [Ollama model page](https://ollama.com/dcostenco/prism-coder) β€” pull and run
- [Prism MCP Server](https://github.com/dcostenco/prism-coder) β€” the MCP server
- [Qwen3.5-4B base](https://huggingface.co/Qwen/Qwen3.5-4B) β€” upstream model