dcostenco commited on
Commit
4eb22d2
Β·
verified Β·
1 Parent(s): 58778b6

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +62 -86
  3. prism-coder-32b-q4km.gguf +3 -0
.gitattributes CHANGED
@@ -43,3 +43,4 @@ qwen3-30b-a3b-v3-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
43
  qwen3-30b-a3b-v4-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
44
  qwen3-30b-a3b-v5-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
45
  qwen3-30b-a3b-v7-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
 
 
43
  qwen3-30b-a3b-v4-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
44
  qwen3-30b-a3b-v5-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
45
  qwen3-30b-a3b-v7-iq4nl.gguf filter=lfs diff=lfs merge=lfs -text
46
+ prism-coder-32b-q4km.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,113 +1,89 @@
1
  ---
2
- language: en
3
  license: apache-2.0
4
- base_model: Qwen/Qwen3-30B-A3B
 
5
  tags:
6
  - tool-calling
7
- - routing
8
- - aac
9
- - qwen3
10
- - moe
11
- - gguf
 
 
 
12
  ---
13
 
14
- # prism-coder:32b β€” Tool Routing Model (Desktop Quality Tier)
15
 
16
- Fine-tuned Qwen3-30B-A3B (MoE) for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
17
- Quality escalation tier in the desktop cascade: **14B β†’ 32B β†’ cloud Claude**.
18
 
19
- > **v5 (May 2026)**: Switched base from dense Qwen3-32B to Qwen3-30B-A3B (MoE).
20
- > Same accuracy, 9 GB smaller, ~4Γ— faster inference (only ~3B params active per token).
21
 
22
- ## BFCL Routing Benchmark β€” v7 (Current)
23
 
24
- **Mean: 100.0% PERFECT** (3-seed average, seeds 2027/2028/2029, 102 cases each)
 
 
 
 
 
25
 
26
- | Category | Count | Description | Accuracy |
27
- |----------|------:|-------------|:--------:|
28
- | aac | 12 | AAC phrase requests β†’ plain text | 100% |
29
- | cmpct | 6 | Ledger compaction | 100% |
30
- | edge | 6 | Multi-step / compound requests | 100% |
31
- | hand | 8 | Agent handoff / relay | 100% |
32
- | info | 5 | General facts β†’ plain text | 100% |
33
- | irrel | 10 | Irrelevant / live queries β†’ plain text | 100% |
34
- | know | 7 | Knowledge base search | 100% |
35
- | load | 9 | Session context loading | 100% |
36
- | pred | 8 | Factual / knowledge queries β†’ plain text | 100% |
37
- | save | 13 | Session ledger save | 100% |
38
- | smem | 12 | Session memory search | 100% |
39
- | tran | 6 | Translation requests β†’ plain text | 100% |
40
 
41
- All 12 categories at 100%. No remaining failures.
 
 
 
 
 
 
 
 
 
 
42
 
43
- Eval: MLX inference + thinking, temperature=0, 3-seed mean.
44
- Gate: β‰₯90% = deploy.
45
 
46
- ## Full Cascade Benchmark (May 2026)
47
 
48
- Individual BFCL scores (MLX, 3 seeds):
49
-
50
- | Model | BFCL | Size | Tier |
51
- |-------|------|------|------|
52
- | prism-coder:8b v36 | **100.0% PERFECT** | 4.7 GB | Desktop / Mobile tier |
53
- | prism-coder:14b v36 | **100.0% PERFECT** | 8.4 GB | Desktop primary tier |
54
- | prism-coder:32b v7 | **100.0% PERFECT** | 16 GB | Desktop quality tier |
55
-
56
- Cascade eval: **14b β†’ 32b β†’ Claude Opus** (102 cases Γ— 3 seeds)
57
-
58
- | Metric | Result |
59
- |--------|--------|
60
- | Cascade accuracy | **100.0%** (mean, 3 seeds) |
61
- | Opus-solo etalon | 98.3% |
62
- | Ξ” vs Opus | **+1.7%** |
63
- | Traffic served by 14b | **99%** (101/102 cases avg) |
64
- | Traffic escalated to 32b | 1% (1/102 avg) β€” catches `save live state` β†’ handoff edge case |
65
- | Traffic reaching Opus API | **0%** |
66
 
67
- Fine-tuned cascade outperforms Claude Opus on `edge` (+16.7%) and `know` (+14.3%).
 
 
 
 
68
 
69
- ## Version History
70
 
71
- | Version | Base | BFCL | Notes |
72
- |---------|------|------|-------|
73
- | v7 (current) | Qwen3-30B-A3B MoE | **100.0% PERFECT** | Fixed: "what do I know + search memory" compound β†’ knowledge_search |
74
- | v6 | Qwen3-30B-A3B MoE | 99.0% | Fixed MoE merge (BF16 safetensors + correct MLX→HF key mapping) |
75
- | v5 | Qwen3-30B-A3B MoE | 97.1% | 18Γ— density fix; 9GB smaller, 4Γ— faster vs dense |
76
- | v4 | Qwen3-30B-A3B MoE | 92.2% | rank=32 experiment β€” regressed vs v3 |
77
- | v3 | Qwen3-30B-A3B MoE | 92.5% | 20Γ— reps + LR=1e-5 β€” hit rank bottleneck |
78
- | v2 | Qwen3-30B-A3B MoE | 92.5% | v34 corpus + 1400 iters |
79
- | v33 (dense) | Qwen3-32B dense | 99.0% | Prior generation β€” larger/slower |
80
 
81
- ## Tools
 
 
 
82
 
83
- The model routes between exactly 6 tools:
84
 
85
- 1. `session_load_context` β€” load/fetch/resume project context
86
- 2. `session_save_ledger` β€” note/log/remember/record progress
87
- 3. `session_save_handoff` β€” handoff/relay to next agent/session
88
- 4. `session_compact_ledger` β€” compact/archive/shrink ledger
89
- 5. `session_search_memory` β€” recall past sessions/conversations
90
- 6. `knowledge_search` β€” search stored notes/knowledge base
91
 
92
- ## Files
93
 
94
- | File | Size | Use |
95
- |------|------|-----|
96
- | `qwen3-30b-a3b-v7-iq4nl.gguf` | 16 GB | **Current β€” recommended** |
97
- | `qwen3-30b-a3b-v6-iq4nl.gguf` | 17 GB | Previous (99.0%) |
98
- | `qwen3-30b-a3b-v5-iq4nl.gguf` | 17 GB | Previous (97.1%) |
99
- | `qwen3-32b-v33-q6k.gguf` | 25 GB | Dense predecessor (99.0%, legacy) |
100
 
101
- ## Usage (Ollama)
102
 
103
- ```bash
104
- ollama run dcostenco/prism-coder:32b
105
- ```
106
 
107
- ## Training
108
 
109
- - **Base**: Qwen/Qwen3-30B-A3B (HF BF16, ~57 GB)
110
- - **Adapters**: v6 LoRA (rank=8, scale=10, 8 layers, LR=1e-5)
111
- - **Merge**: Direct safetensors merge on HF BF16 base; delta = (scale/rank) Γ— B^T A^T for attn/gate; delta[i] = (scale/rank) Γ— B[i] A[i] for MoE experts (128 experts stacked)
112
- - **Key fix**: v5 merge used wrong base (MLX 4-bit, can't apply float LoRA delta) and uppercase regex `lora_[AB]` vs actual lowercase `lora_a`/`lora_b` adapter keys
113
- - **Hardware**: Apple Silicon (M-series, 64 GB RAM)
 
1
  ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
  - tool-calling
7
+ - function-calling
8
+ - prism
9
+ - synalux
10
+ - memory-augmented
11
+ - LoRA
12
+ - Q4_K_M
13
+ base_model: Qwen/Qwen3-32B
14
+ pipeline_tag: text-generation
15
  ---
16
 
17
+ # Prism Coder 32B β€” Tool-Routing Model
18
 
19
+ **100% strict accuracy** on eval_300 (300 cases, 3-seed validated, zero failures).
 
20
 
21
+ Prism Coder 32B is a fine-tuned Qwen3-32B model specialized for routing user requests to the correct Prism Memory tool. It handles 17 distinct tools plus NO_TOOL abstention across natural phrasing, adversarial traps, disambiguation, edge cases, multi-intent, cascades, parameter extraction, and verification categories.
 
22
 
23
+ ## Performance
24
 
25
+ | Metric | Score |
26
+ |--------|-------|
27
+ | **eval_300 strict** | **300/300 (100%)** |
28
+ | 3-seed validation | 300/300 Γ— 3 |
29
+ | avg latency | 1.4s (M5 Max) |
30
+ | hallucinations | 0 |
31
 
32
+ ### Per-Category Breakdown
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ | Category | Score |
35
+ |----------|-------|
36
+ | abstention | 20/20 |
37
+ | adversarial_trap | 70/70 |
38
+ | cascade | 25/25 |
39
+ | disambiguation | 40/40 |
40
+ | edge_case | 25/25 |
41
+ | multi_intent | 20/20 |
42
+ | natural_phrasing | 50/50 |
43
+ | param_extraction | 25/25 |
44
+ | verifier | 25/25 |
45
 
46
+ ## Tools Supported
 
47
 
48
+ 17 Prism Memory tools: `session_load_context`, `session_save_ledger`, `session_save_handoff`, `session_search_memory`, `session_forget_memory`, `session_health_check`, `session_compact_ledger`, `session_export_memory`, `session_task_route`, `session_save_experience`, `session_synthesize_edges`, `session_backfill_links`, `knowledge_search`, `knowledge_forget`, `knowledge_upvote`, `knowledge_downvote`, `knowledge_set_retention`.
49
 
50
+ ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ - **Base model**: Qwen/Qwen3-32B (4-bit quantized for training)
53
+ - **Method**: MLX LoRA SFT (rank=16, 8 layers, scale=20.0) Γ— 14 iterative rounds
54
+ - **Training data**: 300 eval-aligned prompts + targeted failure remediation per round
55
+ - **Quantization**: Q4_K_M via llama.cpp (18 GB)
56
+ - **Hardware**: Apple M5 Max 48 GB unified memory
57
 
58
+ ## Usage
59
 
60
+ ### Ollama
 
 
 
 
 
 
 
 
61
 
62
+ ```bash
63
+ ollama pull dcostenco/prism-coder:32b
64
+ ollama run dcostenco/prism-coder:32b "Load context for the billing-service project."
65
+ ```
66
 
67
+ ### llama.cpp
68
 
69
+ ```bash
70
+ llama-cli -m prism-coder-32b-q4km.gguf \
71
+ -p "<|im_start|>system\nYou are Synalux...<|im_end|>\n<|im_start|>user\nLoad context for billing.<|im_end|>\n<|im_start|>assistant\n"
72
+ ```
 
 
73
 
74
+ ## Model Family
75
 
76
+ | Model | Size | eval_300 |
77
+ |-------|------|----------|
78
+ | prism-coder:1b7 | 2.2 GB | 100% |
79
+ | prism-coder:4b | 2.5 GB | 100% |
80
+ | prism-coder:14b | 9.0 GB | 99.7% |
81
+ | **prism-coder:32b** | **18 GB** | **100%** |
82
 
83
+ ## License
84
 
85
+ Apache 2.0
 
 
86
 
87
+ ## Author
88
 
89
+ [Synalux](https://synalux.com) β€” AI-powered clinical and development tools.
 
 
 
 
prism-coder-32b-q4km.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd31306a03b67edfd9b7cb3f863f1c95e4fadee58f049bcbf7b9d4a9a953fed6
3
+ size 19762149120