dcostenco commited on
Commit
7001cbf
·
verified ·
1 Parent(s): 788f464

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +26 -131
README.md CHANGED
@@ -1,151 +1,46 @@
1
  ---
2
- language: en
3
  license: apache-2.0
 
 
4
  tags:
5
- - tool-routing
6
  - function-calling
7
- - prism-memory
8
- - prism-aac
9
- - qwen3
10
- - gguf
11
  base_model: Qwen/Qwen3-4B
 
12
  ---
13
 
14
- # prism-coder:4b — Full Prism Memory Router (Mid-Tier)
15
 
16
- Fine-tuned Qwen3-4B for 17-tool Prism Memory routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
17
- Primary deployment: **Mac / PC / high-memory mobile** via Ollama or llama.cpp GGUF — for devices with ≥8 GB free RAM.
18
 
19
- ## BFCL Routing Benchmark v43 (Current)
20
 
21
- **100.0%** (64/64 strict, 8 categories)
22
 
23
- | Category | Count | Description | Accuracy |
24
- |----------|------:|-------------|:--------:|
25
- | simple | 10 | Direct single-tool invocations | 100% |
26
- | relevance_detection | 10 | No-tool abstention for off-topic prompts | 100% |
27
- | hallucination | 10 | Reject fabricated / nonexistent tools | 100% |
28
- | disambiguation | 8 | Pick correct tool from near-neighbors | 100% |
29
- | format_sensitivity | 5 | Varied natural phrasing for same intent | 100% |
30
- | ast_parameter | 5 | Correct argument extraction | 100% |
31
- | edge_case | 8 | Boundary and adversarial inputs | 100% |
32
- | multi_turn_chain | 8 | Two-step tool sequences | 100% |
33
-
34
- Eval: Ollama inference, temperature=0, greedy decode.
35
- Gate: ≥90% = deploy.
36
-
37
- ## SWE Bench Blind Eval — v43
38
-
39
- **100.0%** (68/68 strict, 7 categories) — held-out test set, no overlap with training data.
40
-
41
- | Category | Count | Accuracy |
42
- |----------|------:|:--------:|
43
- | adversarial_trap | 15 | 100% |
44
- | cascade | 10 | 100% |
45
- | disambiguation | 8 | 100% |
46
- | edge_case | 8 | 100% |
47
- | multi_intent | 4 | 100% |
48
- | natural_phrasing | 15 | 100% |
49
- | verifier | 8 | 100% |
50
-
51
- ## eval-300 — v43
52
-
53
- **100.0%** (300/300 strict, 5 shuffled runs, 0 flaky tests)
54
-
55
- | Category | Count | Accuracy |
56
- |----------|------:|:--------:|
57
- | abstention | 20 | 100% |
58
- | adversarial_trap | 70 | 100% |
59
- | cascade | 25 | 100% |
60
- | disambiguation | 40 | 100% |
61
- | edge_case | 25 | 100% |
62
- | multi_intent | 20 | 100% |
63
- | natural_phrasing | 50 | 100% |
64
- | param_extraction | 25 | 100% |
65
- | verifier | 25 | 100% |
66
-
67
- ## Version History
68
-
69
- | Version | BFCL | SWE Bench | eval-300 | Notes |
70
- |---------|------|-----------|----------|-------|
71
- | v43 | **100%** | **100%** | **100%** | Qwen3-4B base, 17-tool full router, Layer 3 inference-time remapping, 5 surgical patches |
72
-
73
- ## Tools
74
-
75
- The model routes to 17 Prism Memory tools:
76
-
77
- | Tool | Trigger |
78
- |------|---------|
79
- | `session_load_context` | Load / resume / catch me up on project context |
80
- | `session_save_ledger` | Jot down / log / note / record what we did |
81
- | `session_save_experience` | Log milestone / achievement / success event |
82
- | `session_save_handoff` | Save state for next agent / shift change |
83
- | `session_search_memory` | Recall / remind me / find what we decided |
84
- | `session_forget_memory` | Delete a specific memory entry by ID |
85
- | `session_export_memory` | Export session to file (JSON / Markdown) |
86
- | `session_compact_ledger` | Compact / prune old session entries |
87
- | `session_health_check` | Check session integrity |
88
- | `session_synthesize_edges` | Verify / rebuild session link graph |
89
- | `session_backfill_links` | Reconnect / patch missing session links |
90
- | `session_task_route` | Route a task to the right agent tier |
91
- | `knowledge_search` | Search knowledge base / accumulated docs |
92
- | `knowledge_forget` | Delete knowledge entries / wipe records |
93
- | `knowledge_upvote` | Upvote / boost / increase rank of entry |
94
- | `knowledge_downvote` | Downvote / lower rank of entry |
95
- | `knowledge_set_retention` | Set TTL / auto-expire / retention policy |
96
-
97
- Plain text (no tool) for: greetings, general questions, math, code help, weather, CS concepts.
98
-
99
- ## Model Details
100
-
101
- - **Base**: Qwen/Qwen3-4B
102
- - **Format**: GGUF Q4_K_M (~2.3 GB)
103
- - **Context**: 32,768 tokens
104
- - **Training**: MLX LoRA on Apple Silicon, rank=32, alpha=64, 16/36 layers, LR=1e-4 (full) → 3e-5 (surgical patches), 5 patch rounds
105
- - **Corpus**: ~30K rows — 36% tool-use, 40% AAC/clinical, 12% abstention, 12% safety
106
- - **Merge**: direct safetensors delta merge (`delta = (alpha/rank) × B.T @ A.T`) — mlx_lm.fuse not used (silently drops LoRA weights)
107
- - **Quantization**: llama.cpp F16 → Q4_K_M
108
 
109
  ## Usage
110
 
111
  ```bash
112
- ollama pull dcostenco/prism-coder:4b-v43
113
- ollama run dcostenco/prism-coder:4b-v43
114
  ```
115
 
116
- Or drop the GGUF into any llama.cpp-compatible runtime (LM Studio, Jan, llama-server).
117
-
118
- In [Prism AAC](https://github.com/dcostenco/prism-aac) the app loads this model automatically on devices with ≥8 GB free RAM.
119
-
120
- ## Training Scripts
121
-
122
- The `training/` folder in this repo contains the full v43 training pipeline:
123
 
124
- | Script | Purpose |
125
- |--------|---------|
126
- | `build_4b_v43_corpus.py` | Full v43 corpus builder (~30K rows) |
127
- | `build_4b_v43_patch.py` | Patch 1 initial BFCL failures |
128
- | `build_4b_v43_patch2.py` | Patch 2 param extraction + format |
129
- | `build_4b_v43_patch4.py` | Patch 4 task_route + casual phrasing |
130
- | `build_4b_v43_swe_patch.py` | Patch 5 — SWE bench targeted |
131
- | `combine_4b_swe_corpus.py` | Merge base + SWE patch corpus |
132
- | `train_4b_v43_local.sh` | MLX LoRA training (Apple Silicon) |
133
- | `train_4b_v43_swe_patch.sh` | Surgical SWE patch training run |
134
- | `merge_4b_v43.py` | Safe LoRA merge (delta = scale × B.T @ A.T) |
135
- | `export_4b_v43_gguf.sh` | HF safetensors → GGUF F16 → Q4_K_M → Ollama |
136
- | `orchestrate_4b_to_100.sh` | Autonomous patch→train→eval loop |
137
- | `bfcl_eval.py` | 64-test BFCL eval harness with Layer 3 |
138
- | `swe_bench_test.py` | 68-test SWE blind eval harness |
139
- | `eval_300.py` | 300-test standard eval (9 categories) |
140
- | `analyze_swe_failures.py` | Parse failures → patch targets |
141
- | `TRAINING_DECISIONS_4B_V43.md` | Hyperparams, corpus ratios, lessons learned |
142
 
143
- ## Model Family
144
 
145
- | Model | GGUF | RAM | Tools | Repo |
146
- |-------|------|-----|-------|------|
147
- | prism-coder:1b7 | 1.2 GB | ≥3 GB | 6 | [dcostenco/prism-coder-1.7b](https://huggingface.co/dcostenco/prism-coder-1.7b) |
148
- | **prism-coder:4b** | **2.3 GB** | **≥8 GB** | **17** | **this repo** |
149
- | prism-coder:8b | 4.9 GB | ≥16 GB | 6 | [dcostenco/prism-coder-8b](https://huggingface.co/dcostenco/prism-coder-8b) |
150
- | prism-coder:14b | 8.4 GB | ≥24 GB | 6 + TypeScript | [dcostenco/prism-coder-14b](https://huggingface.co/dcostenco/prism-coder-14b) |
151
- | prism-coder:32b | 16 GB | ≥48 GB | 6 | [dcostenco/prism-coder-32b](https://huggingface.co/dcostenco/prism-coder-32b) |
 
1
  ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
+ - tool-calling
7
  - function-calling
8
+ - prism
9
+ - synalux
10
+ - Q4_K_M
 
11
  base_model: Qwen/Qwen3-4B
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ # Prism Coder 4B — Tool-Routing Model
16
 
17
+ **100% strict accuracy** on eval_300 (300 cases, 5-seed validated, zero failures).
 
18
 
19
+ Lightweight tool-routing model for mobile and edge devices. Routes user requests to the correct Prism Memory tool with perfect accuracy.
20
 
21
+ ## Performance
22
 
23
+ | Metric | Score |
24
+ |--------|-------|
25
+ | **eval_300 strict** | **300/300 (100%)** |
26
+ | 5-seed validation | 300/300 x 5 |
27
+ | Context window | 8,192 tokens |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Usage
30
 
31
  ```bash
32
+ ollama pull dcostenco/prism-coder:4b
 
33
  ```
34
 
35
+ ## Model Family
 
 
 
 
 
 
36
 
37
+ | Model | Size | eval_300 |
38
+ |-------|------|----------|
39
+ | prism-coder:1b7 | 2.2 GB | 100% |
40
+ | **prism-coder:4b** | **2.5 GB** | **100%** |
41
+ | prism-coder:14b | 9.0 GB | 99.7% |
42
+ | prism-coder:32b | 18 GB | 100% |
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
+ ## License
45
 
46
+ Apache 2.0 [Synalux](https://synalux.com)