dcostenco commited on
Commit
303254e
·
verified ·
1 Parent(s): 0b12e36

Update model card: v34 98.0% — hand/save/smem fixed

Browse files
Files changed (1) hide show
  1. README.md +44 -44
README.md CHANGED
@@ -15,68 +15,68 @@ tags:
15
  Fine-tuned Qwen3-14B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
  First tier in the desktop cascade: **14B → 32B → cloud Claude**.
17
 
18
- ## BFCL Routing Benchmark — v33 (Current)
19
-
20
- **Mean: 97.1%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
-
22
- | Category | Description | Accuracy |
23
- |----------|-------------|:--------:|
24
- | aac | AAC phrase requests → plain text | 100% |
25
- | cmpct | Ledger compaction | 100% |
26
- | edge | Multi-step / compound requests | 100% |
27
- | hand | Agent handoff / relay | 88% |
28
- | info | General facts → plain text | 100% |
29
- | irrel | Irrelevant / live queries → plain text | 100% |
30
- | know | Knowledge base search | 100% |
31
- | load | Session context loading | 100% |
32
- | pred | Factual / knowledge queries → plain text | 100% |
33
- | save | Session ledger save | 92% |
34
- | smem | Session memory search | 92% |
35
- | tran | Translation requests → plain text | 100% |
36
-
37
- Eval: Ollama inference, temperature=0, Qwen3 thinking suppressed (`<think>\n\n</think>`), num_predict=160.
 
 
38
  Gate: ≥90% = deploy.
39
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
43
  |---------|------|-------|
44
- | v33 | 97.1% | Routing corpus v33, improved hand/save/smem |
 
45
  | v32 | 97.1% | Routing corpus v32 |
46
  | v31 | ~96% | Routing corpus v31 |
47
  | v30 | ~95% | Baseline 14B routing |
48
 
49
  ## Tools
50
 
51
- The model routes between exactly 6 tools:
52
 
53
- 1. `session_load_context` load/fetch/resume project context
54
- 2. `session_save_ledger` — note/log/remember/record progress
55
- 3. `session_save_handoff` handoff/relay to next agent/session
56
- 4. `session_compact_ledger` compact/archive/shrink ledger
57
- 5. `session_search_memory` recall past sessions/conversations
58
- 6. `knowledge_search` search stored notes/knowledge base
 
 
59
 
60
- ## Files
61
 
62
- | File | Size | Use |
63
- |------|------|-----|
64
- | `prism-aac-14b-q4km.gguf` | 9.3 GB | Recommended for Ollama |
65
 
66
- ## Cascade Role
 
 
 
 
67
 
68
- Primary desktop tier. Handles ~97% of routing decisions locally.
69
- Escalates to 32B for edge cases and multi-step compound requests.
70
-
71
- ## Usage (Ollama)
72
 
73
  ```bash
74
- ollama run dcostenco/prism-coder:14b
 
75
  ```
76
 
77
- ## Training
78
-
79
- - **Base**: `Qwen/Qwen3-14B` (fp16, 14.8B params)
80
- - **Framework**: MLX-LM LoRA (rank=8, scale=20, 4 layers)
81
- - **Merge**: Direct safetensors manipulation (delta = scale/rank × B^T A^T)
82
- - **Hardware**: Apple Silicon (M-series, 64 GB RAM)
 
15
  Fine-tuned Qwen3-14B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
  First tier in the desktop cascade: **14B → 32B → cloud Claude**.
17
 
18
+ ## BFCL Routing Benchmark — v34 (Current)
19
+
20
+ **Mean: 98.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
+
22
+ | Category | Count | Description | Accuracy |
23
+ |----------|------:|-------------|:--------:|
24
+ | aac | 12 | AAC phrase requests → plain text | 100% |
25
+ | cmpct | 6 | Ledger compaction | 100% |
26
+ | edge | 6 | Multi-step / compound requests | 67% |
27
+ | hand | 8 | Agent handoff / relay | 100% |
28
+ | info | 5 | General facts → plain text | 100% |
29
+ | irrel | 10 | Irrelevant / live queries → plain text | 100% |
30
+ | know | 7 | Knowledge base search | 100% |
31
+ | load | 9 | Session context loading | 100% |
32
+ | pred | 8 | Factual / knowledge queries → plain text | 100% |
33
+ | save | 13 | Session ledger save | 100% |
34
+ | smem | 12 | Session memory search | 100% |
35
+ | tran | 6 | Translation requests → plain text | 100% |
36
+
37
+ Remaining failures (2/102, both edge): multi-intent compound prompts that mix two tool actions in one sentence.
38
+
39
+ Eval: MLX inference + thinking, temperature=0, 3-seed mean.
40
  Gate: ≥90% = deploy.
41
 
42
  ## Version History
43
 
44
  | Version | BFCL | Notes |
45
  |---------|------|-------|
46
+ | v34 | **98.0%** | Fixed: hand "live state" → handoff, save "Remember:" → ledger, smem BFCL v4 → search_memory |
47
+ | v33 | 97.1% | Fixed irrel/weather/tran hallucinations, smem/hand corpus v33 |
48
  | v32 | 97.1% | Routing corpus v32 |
49
  | v31 | ~96% | Routing corpus v31 |
50
  | v30 | ~95% | Baseline 14B routing |
51
 
52
  ## Tools
53
 
54
+ The model routes to exactly 6 tools:
55
 
56
+ | Tool | Trigger |
57
+ |------|---------|
58
+ | `session_load_context` | Load/resume project context |
59
+ | `session_save_ledger` | Note/log/record/remember something |
60
+ | `session_save_handoff` | Pass state to next agent/session |
61
+ | `session_compact_ledger` | Shrink/prune ledger (no relay) |
62
+ | `session_search_memory` | Recall prior session discussions |
63
+ | `knowledge_search` | Search stored knowledge base |
64
 
65
+ Plain text (no tool) for: AAC phrases, translations, weather, general facts, code, math.
66
 
67
+ ## Model Details
 
 
68
 
69
+ - **Base**: Qwen/Qwen3-14B
70
+ - **Format**: GGUF Q4_K_M (~8.4 GB)
71
+ - **Context**: 32,768 tokens
72
+ - **Training**: MLX LoRA, rank=16, 16 layers, 1000 iters, LR=2e-6, v34 corpus (322 examples)
73
+ - **Merge**: mlx_lm.fuse → llama.cpp convert → Q4_K_M quantization
74
 
75
+ ## Usage
 
 
 
76
 
77
  ```bash
78
+ ollama pull dcostenco/prism-coder-14b
79
+ ollama run prism-coder:14b
80
  ```
81
 
82
+ Or in the [Prism Coder IDE](https://github.com/dcostenco/prism-aac) — set model to `prism-coder:14b` in Settings.