dcostenco commited on
Commit
bb7c6d1
Β·
verified Β·
1 Parent(s): acfc854

Update README: S14 production model, eval_300 299/300 (99.7%), 17-tool routing

Browse files
Files changed (1) hide show
  1. README.md +92 -58
README.md CHANGED
@@ -13,22 +13,63 @@ tags:
13
  - gguf
14
  ---
15
 
16
- # prism-coder:14b β€” Dual-Purpose: Tool Routing + Healthcare TypeScript Coder
17
 
18
  Fine-tuned Qwen3-14B for the [Prism AAC](https://github.com/dcostenco/prism-aac) / Synalux healthcare platform.
19
 
20
- Two trained capabilities in one model family:
21
- - **Routing** (v36): 6-tool routing for Prism MCP sessions β€” 100% BFCL
22
- - **Coding** (v42): Synalux-pattern TypeScript code generation β€” 22/22 checks (100%)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ---
25
 
26
- ## Coding Eval β€” v42 (Current Production Coder)
27
 
28
  **22/22 (100%)** on the Synalux healthcare TypeScript eval.
29
 
30
  Task: write a production Next.js API route for X12 835 ERA reconciliation against existing 837P claims.
31
 
 
 
 
32
  | Check | Pass |
33
  |-------|------|
34
  | withAudit wrapper | βœ“ |
@@ -54,73 +95,66 @@ Task: write a production Next.js API route for X12 835 ERA reconciliation agains
54
  | belt-and-suspenders workspace_id eq on update | βœ“ |
55
  | marks ERA file reconciled | βœ“ |
56
 
57
- Training chain: Qwen3-14B β†’ v34 (1000-iter routing, 18/22) β†’ v39 (HIPAA+CAS patch, 20/22) β†’ v42 (claim status patch, 22/22).
58
-
59
- ### v42 Training Details
60
- - **Base**: Qwen/Qwen3-14B (BF16)
61
- - **Corpus**: v28 Synalux codebase SFT + targeted patch (claim status Γ— 50 examples, resume from v39)
62
- - **Training**: MLX LoRA, rank=16, 8 layers, 100 iters, LR=5e-7
63
- - **Final loss**: 0.036 (converged)
64
- - **Merge**: direct safetensors LoRA merge β†’ GGUF F16 β†’ Q4_K_M
65
 
66
  ---
67
 
68
- ## BFCL Routing Benchmark β€” v36
69
 
70
- **Mean: 100.0% PERFECT** (3-seed average, seeds 2027/2028/2029, 102 cases each)
71
-
72
- | Category | Accuracy |
73
- |----------|:--------:|
74
- | aac (AAC phrase requests) | 100% |
75
- | cmpct (ledger compaction) | 100% |
76
- | edge (multi-step compound) | 100% |
77
- | hand (agent handoff) | 100% |
78
- | info (general facts) | 100% |
79
- | irrel (irrelevant/live queries) | 100% |
80
- | know (knowledge base search) | 100% |
81
- | load (session context loading) | 100% |
82
- | pred (factual queries) | 100% |
83
- | save (session ledger save) | 100% |
84
- | smem (session memory search) | 100% |
85
- | tran (translation) | 100% |
86
-
87
- ### Tools (routing model)
88
- | Tool | Trigger |
89
- |------|---------|
90
- | `session_load_context` | Load/resume project context |
91
- | `session_save_ledger` | Note/log/record/remember |
92
- | `session_save_handoff` | Pass state to next agent/session |
93
- | `session_compact_ledger` | Shrink/prune ledger |
94
- | `session_search_memory` | Recall prior session discussions |
95
- | `knowledge_search` | Search stored knowledge base |
96
 
97
  ---
98
 
99
- ## Version History
100
-
101
- | Version | Eval | Type | Notes |
102
- |---------|------|------|-------|
103
- | v42 | **22/22 coding (100%)** | Coder | Claim status patch on v39; zero tolerance policy |
104
- | v39 | 20/22 coding | Coder | HIPAA non-blocking + CAS CO/PR fixes |
105
- | v36 | **100% BFCL routing** | Router | smem boundary + hand trigger fixes |
106
- | v34 | 98.0% BFCL routing | Router | hand/save/smem fixes |
107
- | v33 | 97.1% BFCL routing | Router | irrel/tran/smem fixes |
108
-
109
  ## GGUF Files
110
 
111
  | File | Use | Size |
112
  |------|-----|------|
113
- | `qwen3-14b-v42-q4km.gguf` | **Coding** β€” production Synalux TypeScript | ~9 GB |
114
- | `prism-coder-14b-v36-q4km.gguf` | **Routing** β€” Prism MCP tool routing | ~9 GB |
115
- | `qwen3-14b-v34-q4km.gguf` | Routing (prior) | ~9 GB |
 
 
 
 
 
 
 
 
 
116
 
117
  ## Usage
118
 
119
  ```bash
120
- # Load as coding model
121
- ollama pull dcostenco/prism-coder-14b
122
- # Then use qwen3-14b-v42-q4km.gguf Modelfile
123
 
124
- # Load as routing model
125
- # Use prism-coder-14b-v36-q4km.gguf Modelfile
 
 
126
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - gguf
14
  ---
15
 
16
+ # prism-coder:14b β€” Prism Memory Tool Router + Healthcare TypeScript Coder
17
 
18
  Fine-tuned Qwen3-14B for the [Prism AAC](https://github.com/dcostenco/prism-aac) / Synalux healthcare platform.
19
 
20
+ ## Current Production Model: S14 (eval_300 β€” 17-tool routing)
21
+
22
+ **299/300 = 99.7% strict** on eval_300 β€” 300 cases, 17 Prism Memory tools
23
+
24
+ Single remaining failure: `"Save."` β€” genuinely ambiguous between `session_save_ledger` and `session_save_experience`. All other categories at 100%.
25
+
26
+ | Category | Accuracy |
27
+ |----------|:--------:|
28
+ | session_save_ledger (ledger logging) | 100%* |
29
+ | session_load_context (context loading) | 100% |
30
+ | session_search_memory (memory recall) | 100% |
31
+ | session_save_handoff (agent handoff) | 100% |
32
+ | session_forget_memory | 100% |
33
+ | session_health_check | 100% |
34
+ | session_compact_ledger | 100% |
35
+ | session_export_memory | 100% |
36
+ | session_task_route | 100% |
37
+ | session_save_experience | 100%* |
38
+ | session_synthesize_edges | 100% |
39
+ | session_backfill_links | 100% |
40
+ | knowledge_search | 100% |
41
+ | knowledge_forget / upvote / downvote / set_retention | 100% |
42
+ | abstain (general questions, greetings, CS concepts) | 100% |
43
+ | multi-intent (compound tool calls) | 100% |
44
+ | natural phrasing | 100% |
45
+
46
+ \* One edge case (`"Save."`) scores as a failure on one tool; both are correct interpretations.
47
+
48
+ ### eval_300 Details β€” S14
49
+ - **Base**: Qwen3-14B β†’ surgical LoRA chain (S1β†’S14)
50
+ - **Eval**: 300 cases, strict scoring (exact tool match), 17 Prism Memory tools + abstain + multi-intent
51
+ - **Training**: MLX LoRA, rank=8, scale=20.0, 16 layers, 100 iters, LR=5e-6, mask_prompt=true
52
+ - **Corpus**: S14 β€” balanced natural-phrasing + tool-use SFT (100 train / 20 valid)
53
+ - **SYSTEM_PROMPT**: Synalux identity + 17 Prism Memory tools + 13 multimodal tool modules + `<tool_call>` JSON block format
54
+
55
+ ### Tools (S14 routing model)
56
+ All 17 Prism Memory tools:
57
+ `session_save_ledger`, `session_load_context`, `session_search_memory`, `session_save_handoff`,
58
+ `session_forget_memory`, `session_health_check`, `session_compact_ledger`, `session_export_memory`,
59
+ `session_task_route`, `session_save_experience`, `session_synthesize_edges`, `session_backfill_links`,
60
+ `knowledge_search`, `knowledge_forget`, `knowledge_upvote`, `knowledge_downvote`, `knowledge_set_retention`
61
 
62
  ---
63
 
64
+ ## Legacy: Coding Eval β€” v42
65
 
66
  **22/22 (100%)** on the Synalux healthcare TypeScript eval.
67
 
68
  Task: write a production Next.js API route for X12 835 ERA reconciliation against existing 837P claims.
69
 
70
+ <details>
71
+ <summary>22-check eval breakdown (click to expand)</summary>
72
+
73
  | Check | Pass |
74
  |-------|------|
75
  | withAudit wrapper | βœ“ |
 
95
  | belt-and-suspenders workspace_id eq on update | βœ“ |
96
  | marks ERA file reconciled | βœ“ |
97
 
98
+ </details>
 
 
 
 
 
 
 
99
 
100
  ---
101
 
102
+ ## Legacy: BFCL Routing Benchmark β€” v36
103
 
104
+ **Mean: 100.0% PERFECT** (3-seed average, seeds 2027/2028/2029, 102 cases each) β€” 6-tool routing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ---
107
 
 
 
 
 
 
 
 
 
 
 
108
  ## GGUF Files
109
 
110
  | File | Use | Size |
111
  |------|-----|------|
112
+ | `qwen3-14b-s14-q4km.gguf` | **Routing** β€” production Prism Memory (17 tools, 99.7%) | ~9 GB |
113
+ | `qwen3-14b-v42-q4km.gguf` | **Coding** β€” Synalux TypeScript (22/22, 100%) | ~9 GB |
114
+ | `prism-coder-14b-v36-q4km.gguf` | Routing legacy (6-tool BFCL, 100%) | ~9 GB |
115
+
116
+ ## Version History
117
+
118
+ | Version | Eval | Type | Notes |
119
+ |---------|------|------|-------|
120
+ | **S14** | **299/300 = 99.7% (eval_300)** | **Router** | **Production β€” 17-tool Prism Memory routing** |
121
+ | v42 | 22/22 coding (100%) | Coder | Claim status patch; Synalux TypeScript |
122
+ | v36 | 100% BFCL (6-tool routing) | Router | Legacy 6-tool routing |
123
+ | v34 | 98.0% BFCL | Router | β€” |
124
 
125
  ## Usage
126
 
127
  ```bash
128
+ # Pull production routing model (S14 β€” 17-tool Prism Memory)
129
+ ollama pull dcostenco/prism-coder:14b
 
130
 
131
+ # Or pull GGUF directly from this repo and use with Ollama:
132
+ # FROM qwen3-14b-s14-q4km.gguf
133
+ # PARAMETER temperature 0
134
+ # PARAMETER num_ctx 8192
135
  ```
136
+
137
+ ### System Prompt (S14)
138
+
139
+ ```
140
+ You are Synalux, a memory-augmented coding and clinical reasoning assistant. You have access to
141
+ Prism Memory tools (session_save_ledger, session_load_context, session_search_memory,
142
+ session_save_handoff, session_forget_memory, session_health_check, session_compact_ledger,
143
+ session_export_memory, session_task_route, session_save_experience, session_synthesize_edges,
144
+ session_backfill_links, knowledge_search, knowledge_forget, knowledge_upvote, knowledge_downvote,
145
+ knowledge_set_retention) and 13 multimodal tool modules (image_gen, office, web_scraper, browser,
146
+ tts, ocr, git, terminal, deps_scanner, hipaa, data_graph, templates, pdf_parser). Think
147
+ step-by-step before answering. When the user references past work, prior decisions, or stored
148
+ context, use the appropriate Prism Memory tool. Format tool calls inside <tool_call>...</tool_call>
149
+ JSON blocks with fields 'name' and 'arguments'. If no tool is needed, answer directly in plain
150
+ text. ABSTAIN for general programming questions, CS concepts, greetings, and capability questions.
151
+ ```
152
+
153
+ ## Cascade
154
+
155
+ | Tier | Model | Role |
156
+ |------|-------|------|
157
+ | 1.7B | `dcostenco/prism-coder:1b7` | Fast verify / edge cases |
158
+ | 4B | `dcostenco/prism-coder:4b` | Mid-tier verify |
159
+ | **14B** | **`dcostenco/prism-coder:14b`** | **Production routing** |
160
+ | 32B | `dcostenco/prism-coder:32b` | Top-tier / complex reasoning |