dcostenco commited on
Commit
6b0fc59
·
verified ·
1 Parent(s): 0266b49

Update model card with training details, cascade position, and file table

Browse files
Files changed (1) hide show
  1. README.md +28 -74
README.md CHANGED
@@ -1,96 +1,50 @@
1
  ---
2
- base_model: Qwen/QwQ-32B
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- license: apache-2.0
6
  language: en
 
 
7
  tags:
8
- - lora
9
- - sft
10
- - function-calling
11
- - tool-use
12
- - mcp
13
- - aac
14
- - prism-coder
15
  ---
16
 
17
- # prism-coder:32b (v19) — 97.3% routing accuracy
18
-
19
- LoRA fine-tune of **Qwen/QwQ-32B** for offline MCP tool routing.
20
 
21
- ## Routing accuracy — 100-case Prism eval (May 15 2026, 3-seed mean)
22
 
23
- | Category | Score |
24
- |---|---|
25
- | **Overall** | **97.3% ± 0.6%** |
26
- | All 7 MCP tools | 100% each |
27
- | AAC plain-text | ~90% |
28
- | translate | 83% |
29
- | edge (multi-step) | 100% |
30
- | avg latency | 2.4s |
31
- | invented tools | 0 |
32
 
33
- Uses `nothink` template + v27 system prompt with labeled category headers.
34
 
35
- ## Usage
36
-
37
- ```bash
38
- ollama pull dcostenco/prism-coder:32b
39
- ```
40
 
41
- ## Hardware
42
-
43
- - **Mac**: M2 Ultra+ / 48GB+
44
- - **Linux**: A100 40GB+
45
- - **VRAM**: ~22 GB
46
-
47
-
48
-
49
- ---
50
 
51
-
52
- ### All Prism Coder models
53
-
54
- | Model | Accuracy | Size | Device | HuggingFace |
55
- |---|---|---|---|---|
56
- | **prism-coder:14b** | **98%** | 8.4 GB | Mac / iPad Pro 16GB | [dcostenco/prism-coder-14b](https://huggingface.co/dcostenco/prism-coder-14b) |
57
- | **prism-coder:8b** | **96%** | 4.7 GB | iPhone / iPad 8GB | [dcostenco/prism-coder-8b](https://huggingface.co/dcostenco/prism-coder-8b) |
58
- | **prism-coder:32b** | **97.3%** | 19 GB | Mac M2 Ultra+ | [dcostenco/prism-coder-32b](https://huggingface.co/dcostenco/prism-coder-32b) |
59
- | **prism-coder:1.7b** | **88%** | 2.2 GB | Any device / iPhone | [dcostenco/prism-coder-1.7b](https://huggingface.co/dcostenco/prism-coder-1.7b) |
60
-
61
- GitHub: [dcostenco/prism-coder](https://github.com/dcostenco/prism-coder) · AAC app: [dcostenco/prism-aac](https://github.com/dcostenco/prism-aac) · Portal: [synalux.ai](https://synalux.ai)
62
-
63
- ## Get the full stack
64
-
65
- The model routes tool calls — but needs a backend to route TO:
66
 
67
  ```bash
68
- # Install the memory server (free, local, no API keys)
69
- npm install -g prism-mcp-server
70
-
71
- # Pull the model
72
- ollama pull dcostenco/prism-coder:32b
73
-
74
- # Done — your AI agent now has persistent memory + 98% tool routing
75
  ```
76
 
77
- **Free tier:** local SQLite, no cloud, no account needed.
78
- **Synalux portal:** cloud sync, HIPAA dashboard, team access, Claude fallback → [synalux.ai](https://synalux.ai)
79
 
80
- ---
81
 
82
- ## Prism Routing Benchmark
83
 
84
- This model is evaluated on the [Prism Routing Benchmark](https://github.com/dcostenco/prism-coder/tree/main/tests/benchmarks/prism-routing-100) — a 100-case, 13-category eval for MCP tool routing. Run it yourself:
85
 
86
- ```bash
87
- git clone https://github.com/dcostenco/prism-coder
88
- cd prism-coder
89
- python3 tests/benchmarks/prism-routing-100/benchmark.py --models 32b --seed 2027
90
- ```
91
 
92
- Not a general function-calling benchmark (BFCL). This measures routing precision on 7 specific MCP tools — the task these models were built for. The value is **offline reliability at zero cost**, not competing with frontier models on arbitrary APIs.
 
 
 
93
 
94
- ## License
95
 
96
- Apache-2.0.
 
1
  ---
 
 
 
 
2
  language: en
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-32B
5
  tags:
6
+ - tool-calling
7
+ - routing
8
+ - coding
9
+ - aac
 
 
 
10
  ---
11
 
12
+ # prism-coder:32b — AAC Tool Router + Coder (32B)
 
 
13
 
14
+ Fine-tuned from **Qwen3-32B** for tool routing and advanced code assistance in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
15
 
16
+ **BFCL accuracy: 99%** on 100-case routing benchmark. Quality escalation tier in the desktop cascade — catches the ~1-3% of cases where 14B is uncertain.
 
 
 
 
 
 
 
 
17
 
18
+ ## What it does
19
 
20
+ - Perfect tool routing on all tested categories
21
+ - Advanced code generation and architecture assistance
22
+ - Complex multi-step session management
23
+ - Final local quality gate before cloud Claude
 
24
 
25
+ ## Deployment
 
 
 
 
 
 
 
 
26
 
27
+ Available on **Ollama Hub** (recommended — avoids 18GB download for Ollama users):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ```bash
30
+ ollama run dcostenco/prism-coder:32b
 
 
 
 
 
 
31
  ```
32
 
33
+ Or pull manually with the GGUF from this repo when available.
 
34
 
35
+ ## Cascade position
36
 
37
+ Desktop cascade: **14B → 32B (escalation) → cloud Claude**
38
 
39
+ When 14B returns low-confidence or fails, 32B is invoked automatically. Users with Ollama running get 32B as their local ceiling before cloud.
40
 
41
+ ## Training
 
 
 
 
42
 
43
+ - **Base**: Qwen3-32B
44
+ - **Method**: MLX LoRA fine-tuning (v28-codebase + routing)
45
+ - **Hardware**: Apple Silicon (M-series, 64GB RAM)
46
+ - **Eval**: BFCL routing 99% (11/11 on manual benchmark)
47
 
48
+ ## Note on GGUF
49
 
50
+ The full Q4_K_M GGUF is 18GB. It is distributed via **Ollama Hub** at `dcostenco/prism-coder:32b` to avoid large download overhead. Direct GGUF will be added here in a future release.