---
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-aac
  - qwen3
  - gguf
base_model: Qwen/Qwen3-1.7B
---

# prism-coder:1.7b — Tool Routing Model (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
Primary deployment: **any iOS device** via llama.cpp GGUF — the guaranteed fallback for all device tiers.

## BFCL Routing Benchmark — v42 (Current)

**Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)

| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| aac | 12 | AAC phrase requests → plain text | 100% |
| cmpct | 6 | Ledger compaction | 100% |
| edge | 6 | Multi-step / compound requests | 100% |
| hand | 8 | Agent handoff / relay | 100% |
| info | 5 | General facts → plain text | 100% |
| irrel | 10 | Irrelevant / live queries → plain text | 100% |
| know | 7 | Knowledge base search | 100% |
| load | 9 | Session context loading | 100% |
| pred | 8 | Factual / knowledge queries → plain text | 100% |
| save | 13 | Session ledger save | 100% |
| smem | 12 | Session memory search | 100% |
| tran | 6 | Translation requests → plain text | 100% |

Eval: MLX inference + thinking, temperature=0, 3-seed mean.
Gate: ≥90% = deploy.

## Version History

| Version | BFCL | Notes |
|---------|------|-------|
| v42 | **100.0%** | Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load |
| v41 | 96.1% | Proper safetensors merge — fixes mlx_lm.fuse LoRA loss |
| v36 | 94.1% | LoRA rank=16, all 28 layers, mask-prompt |
| v19 | ~88% | Baseline 1.7B routing |

## Tools

The model routes to exactly 6 tools:

| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume/pull project context |
| `session_save_ledger` | Note/log/record/remember something |
| `session_save_handoff` | Pass state to next agent/session |
| `session_compact_ledger` | Compact/shrink/prune ledger |
| `session_search_memory` | Recall prior session discussions |
| `knowledge_search` | Search stored knowledge base ("what do I know") |

Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.

## Model Details

- **Base**: Qwen/Qwen3-1.7B
- **Format**: GGUF Q4_K_M (~1.2 GB)
- **Context**: 32,768 tokens
- **Training**: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
- **Merge**: direct safetensors merge (scale/rank × B.T @ A.T) → llama.cpp convert → Q4_K_M quantization

## Usage

```bash
ollama pull dcostenco/prism-coder:1b7
ollama run prism-coder:1b7
```

Or in [Prism AAC](https://github.com/dcostenco/prism-aac) — the app downloads and loads this model automatically on devices with <8 GB RAM.