dcostenco commited on
Commit
6ee010d
Β·
verified Β·
1 Parent(s): cb6eebe

Update README: v41 (96.1% BFCL) with per-category table

Browse files
Files changed (1) hide show
  1. README.md +61 -35
README.md CHANGED
@@ -3,56 +3,82 @@ language: en
3
  license: apache-2.0
4
  base_model: Qwen/Qwen3-1.7B
5
  tags:
6
- - tool-calling
7
- - routing
8
- - aac
 
9
  - gguf
10
- - mlx
11
  ---
12
 
13
- # prism-coder:1b7 β€” AAC Tool Router (1.7B)
14
 
15
- Fine-tuned from **Qwen3-1.7B** for deterministic tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
 
16
 
17
- **BFCL accuracy: 100%** on 100-case Γ— 3 seeds routing benchmark (v36 corpus).
18
 
19
- ## What it does
20
 
21
- Routes user messages to one of 6 tools or plain text with zero hallucination:
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- | Tool | Trigger |
24
- |------|---------|
25
- | `session_load_context` | Load/fetch context for project X |
26
- | `session_save_ledger` | Note / jot down / log / remember |
27
- | `session_save_handoff` | Handoff to next agent / pass on |
28
- | `session_compact_ledger` | Compact/archive/trim the ledger |
29
- | `session_search_memory` | What did we discuss / recall session |
30
- | `knowledge_search` | What do I know / stored notes |
31
- | *(plain text)* | AAC phrases, math, facts, translation, time |
32
 
33
- ## Deployment
34
 
35
- **iOS / edge** β€” runs on-device via llama.cpp (1.0 GB, Q4_K_M):
 
 
 
 
 
36
 
37
- ```bash
38
- ollama run dcostenco/prism-coder:1b7
39
- ```
 
 
 
 
 
 
 
40
 
41
  ## Files
42
 
43
- | File | Size | Format |
44
- |------|------|--------|
45
- | `prism-coder-1b7-v36-q4km.gguf` | 1.0 GB | Q4_K_M GGUF (recommended) |
46
- | `prism-aac-1b7-q4km.gguf` | 1.0 GB | Q4_K_M GGUF (legacy name) |
47
 
48
- ## Training
49
 
50
- - **Base**: Qwen3-1.7B
51
- - **Method**: MLX LoRA fine-tuning (mlx_lm.lora)
52
- - **Dataset**: v36_1b7 routing corpus (414 examples, 6-tool system prompt)
53
- - **Hardware**: Apple Silicon (M-series), ~4GB RAM
54
- - **Eval**: BFCL 100-case benchmark Γ— 3 seeds β†’ **100%**
55
 
56
- ## System prompt
 
 
 
 
 
 
57
 
58
- Uses the 13-rule routing system prompt. See [Prism AAC](https://github.com/dcostenco/prism-aac) for the canonical prompt used in training and inference.
 
 
 
 
 
3
  license: apache-2.0
4
  base_model: Qwen/Qwen3-1.7B
5
  tags:
6
+ - tool-routing
7
+ - function-calling
8
+ - prism-aac
9
+ - qwen3
10
  - gguf
 
11
  ---
12
 
13
+ # prism-coder:1b7 β€” Tool Routing Model (Ultra-Compact / iOS Tier)
14
 
15
+ Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
+ Primary deployment: **on-device iOS inference** via llama.cpp (1.1 GB GGUF, Q4_K_M).
17
 
18
+ ## BFCL Routing Benchmark β€” v41 (Current)
19
 
20
+ **Mean: 96.1%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
 
22
+ | Category | Description | Accuracy |
23
+ |----------|-------------|:--------:|
24
+ | aac | AAC phrase requests β†’ plain text | 100% |
25
+ | cmpct | Ledger compaction | 83% |
26
+ | edge | Multi-step / compound requests | 83% |
27
+ | hand | Agent handoff / relay | 100% |
28
+ | info | General facts β†’ plain text | 100% |
29
+ | irrel | Irrelevant / live queries β†’ plain text | 90% |
30
+ | know | Knowledge base search | 100% |
31
+ | load | Session context loading | 89% |
32
+ | pred | Factual / knowledge queries β†’ plain text | 100% |
33
+ | save | Session ledger save | 100% |
34
+ | smem | Session memory search | 100% |
35
+ | tran | Translation requests β†’ plain text | 100% |
36
 
37
+ Eval: Ollama inference, temperature=0, Qwen3 thinking suppressed (`<think>\n\n</think>`), num_predict=160.
38
+ Gate: β‰₯90% = deploy.
 
 
 
 
 
 
 
39
 
40
+ ## Version History
41
 
42
+ | Version | BFCL | Notes |
43
+ |---------|------|-------|
44
+ | v41 | 96.1% | Current β€” routing corpus v41 |
45
+ | v40 | ~95% | Routing corpus v40 |
46
+ | v39 | ~94% | Routing corpus v39 |
47
+ | v36 | 100% | Previous β€” routing corpus v36 (small eval set) |
48
 
49
+ ## Tools
50
+
51
+ The model routes between exactly 6 tools:
52
+
53
+ 1. `session_load_context` β€” load/fetch/resume project context
54
+ 2. `session_save_ledger` β€” note/log/remember/record progress
55
+ 3. `session_save_handoff` β€” handoff/relay to next agent/session
56
+ 4. `session_compact_ledger` β€” compact/archive/shrink ledger
57
+ 5. `session_search_memory` β€” recall past sessions/conversations
58
+ 6. `knowledge_search` β€” search stored notes/knowledge base
59
 
60
  ## Files
61
 
62
+ | File | Size | Use |
63
+ |------|------|-----|
64
+ | `prism-coder-1b7-v41-q4km.gguf` | 1.1 GB | Ollama / desktop |
65
+ | `prism-aac-1b7-q4km.gguf` | 1.1 GB | iOS app download |
66
 
67
+ ## Cascade Role
68
 
69
+ Ultra-compact iOS/edge tier. Desktop cascade: **1.7B β†’ 8B β†’ 14B β†’ 32B β†’ cloud Claude**.
70
+ 1.7B handles offline on-device routing where memory is tightly constrained (< 2 GB available).
 
 
 
71
 
72
+ ## Usage (Ollama)
73
+
74
+ ```bash
75
+ ollama run dcostenco/prism-coder:1b7
76
+ ```
77
+
78
+ ## Training
79
 
80
+ - **Base**: `Qwen/Qwen3-1.7B` (fp16, 1.7B params)
81
+ - **Framework**: MLX-LM LoRA (rank=8, scale=20, 4 layers)
82
+ - **Data**: v41 routing corpus
83
+ - **Merge**: Direct safetensors manipulation (delta = scale/rank Γ— B^T A^T)
84
+ - **Peak memory**: ~4 GB (M-series Mac)