prism-coder-8b / README.md
dcostenco's picture
Upload README.md with huggingface_hub
9ae524d verified
---
language: en
license: apache-2.0
tags:
- tool-routing
- function-calling
- prism-aac
- qwen3
- gguf
base_model: Qwen/Qwen3-8B
---
# prism-coder:8b β€” Tool Routing Model (iOS / Edge Tier)
Fine-tuned Qwen3-8B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
Primary deployment: **iOS and edge devices** via llama.cpp GGUF.
## BFCL Routing Benchmark β€” v36 (Current)
**Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| aac | 12 | AAC phrase requests β†’ plain text | 100% |
| cmpct | 6 | Ledger compaction | 100% |
| edge | 6 | Multi-step / compound requests | 100% |
| hand | 8 | Agent handoff / relay | 100% |
| info | 5 | General facts β†’ plain text | 100% |
| irrel | 10 | Irrelevant / live queries β†’ plain text | 100% |
| know | 7 | Knowledge base search | 100% |
| load | 9 | Session context loading | 100% |
| pred | 8 | Factual / knowledge queries β†’ plain text | 100% |
| save | 13 | Session ledger save | 100% |
| smem | 12 | Session memory search | 100% |
| tran | 6 | Translation requests β†’ plain text | 100% |
Eval: MLX inference + thinking, temperature=0, 3-seed mean.
Gate: β‰₯90% = deploy.
## Cascade Benchmark (May 2026)
Full desktop cascade: **14b β†’ 32b β†’ Claude Opus** (102 cases Γ— 3 seeds)
| Metric | Result |
|--------|--------|
| Cascade accuracy | **100.0%** (mean, 3 seeds) |
| Opus-solo etalon | 98.3% |
| Ξ” vs Opus | **+1.7%** |
| Traffic served by 14b | **99%** (101/102 cases avg) |
| Traffic escalated to 32b | 1% (1/102 avg) |
| Traffic reaching Opus API | **0%** |
Fine-tuned cascade outperforms Claude Opus on `edge` (+16.7%) and `know` (+14.3%).
## Version History
| Version | BFCL | Notes |
|---------|------|-------|
| v36 | **100.0%** | Fixed: smem "BFCL v4 notes" and "training loss" β†’ session_search_memory |
| v35 | 98.0% | Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss |
| v32 | 98.0% | Routing corpus v32_8b, direct safetensors merge |
| v31 | 95.1% | Surgical smem/know boundary fix |
| v30 | ~93% | Baseline 8B routing |
## Tools
The model routes to exactly 6 tools:
| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume project context |
| `session_save_ledger` | Note/log/record/remember something |
| `session_save_handoff` | Pass state to next agent/session |
| `session_compact_ledger` | Shrink/prune ledger (no relay) |
| `session_search_memory` | Recall prior session discussions |
| `knowledge_search` | Search stored knowledge base |
Plain text (no tool) for: AAC phrases, translations, weather, general facts, code, math.
## Model Details
- **Base**: Qwen/Qwen3-8B
- **Format**: GGUF Q4_K_M (~4.9 GB)
- **Context**: 32,768 tokens
- **Training**: MLX LoRA, rank=16, 16 layers, 1000 iters, LR=2e-6, v36 corpus (806 examples)
- **Merge**: mlx_lm.fuse β†’ llama.cpp convert β†’ Q4_K_M quantization
## Usage
```bash
ollama pull dcostenco/prism-coder-8b
ollama run prism-coder:8b
```
Or in the [Prism Coder IDE](https://github.com/dcostenco/prism-aac) β€” set model to `prism-coder:8b` in Settings.