prism-coder-1.7b / README.md
dcostenco's picture
Upload README.md with huggingface_hub
8e12d5c verified
---
language: en
license: apache-2.0
tags:
- tool-routing
- function-calling
- prism-aac
- qwen3
- gguf
base_model: Qwen/Qwen3-1.7B
---
# prism-coder:1.7b β€” Tool Routing Model (Always-Fits Tier)
Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
Primary deployment: **any iOS device** via llama.cpp GGUF β€” the guaranteed fallback for all device tiers.
## BFCL Routing Benchmark β€” v42 (Current)
**Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| aac | 12 | AAC phrase requests β†’ plain text | 100% |
| cmpct | 6 | Ledger compaction | 100% |
| edge | 6 | Multi-step / compound requests | 100% |
| hand | 8 | Agent handoff / relay | 100% |
| info | 5 | General facts β†’ plain text | 100% |
| irrel | 10 | Irrelevant / live queries β†’ plain text | 100% |
| know | 7 | Knowledge base search | 100% |
| load | 9 | Session context loading | 100% |
| pred | 8 | Factual / knowledge queries β†’ plain text | 100% |
| save | 13 | Session ledger save | 100% |
| smem | 12 | Session memory search | 100% |
| tran | 6 | Translation requests β†’ plain text | 100% |
Eval: MLX inference + thinking, temperature=0, 3-seed mean.
Gate: β‰₯90% = deploy.
## Version History
| Version | BFCL | Notes |
|---------|------|-------|
| v42 | **100.0%** | Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load |
| v41 | 96.1% | Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss |
| v36 | 94.1% | LoRA rank=16, all 28 layers, mask-prompt |
| v19 | ~88% | Baseline 1.7B routing |
## Tools
The model routes to exactly 6 tools:
| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume/pull project context |
| `session_save_ledger` | Note/log/record/remember something |
| `session_save_handoff` | Pass state to next agent/session |
| `session_compact_ledger` | Compact/shrink/prune ledger |
| `session_search_memory` | Recall prior session discussions |
| `knowledge_search` | Search stored knowledge base ("what do I know") |
Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.
## Model Details
- **Base**: Qwen/Qwen3-1.7B
- **Format**: GGUF Q4_K_M (~1.2 GB)
- **Context**: 32,768 tokens
- **Training**: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
- **Merge**: direct safetensors merge (scale/rank Γ— B.T @ A.T) β†’ llama.cpp convert β†’ Q4_K_M quantization
## Usage
```bash
ollama pull dcostenco/prism-coder:1b7
ollama run prism-coder:1b7
```
Or in [Prism AAC](https://github.com/dcostenco/prism-aac) β€” the app downloads and loads this model automatically on devices with <8 GB RAM.