dcostenco commited on
Commit
9d3d252
·
verified ·
1 Parent(s): 9388767

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +35 -53
README.md CHANGED
@@ -5,7 +5,6 @@ language:
5
  tags:
6
  - tool-calling
7
  - function-calling
8
- - code-generation
9
  - prism
10
  - synalux
11
  - memory-augmented
@@ -15,76 +14,59 @@ base_model: Qwen/Qwen3-32B
15
  pipeline_tag: text-generation
16
  ---
17
 
18
- # Prism Coder 32B — Unified Tool-Routing & Code Generation Model
19
 
20
- **100% strict accuracy** on eval_300 (300 cases, 3-seed validated, zero failures).
21
 
22
- Prism Coder 32B is a fine-tuned Qwen3-32B model that handles both Prism Memory tool routing (17 tools + NO_TOOL abstention) and general code generation. One model, two jobs — no need for separate routing and IDE models.
 
 
 
 
 
 
 
 
 
23
 
24
  ## Performance
25
 
26
- | Metric | Score |
27
- |--------|-------|
28
- | **eval_300 strict** | **300/300 (100%)** |
29
- | 3-seed validation | 300/300 x 3 |
30
- | avg latency | 1.4s (M5 Max) |
31
- | hallucinations | 0 |
32
- | context window | **16,384 tokens** |
 
 
33
 
34
- ### Per-Category Breakdown (eval_300)
35
 
36
- | Category | Score |
37
- |----------|-------|
38
- | abstention | 20/20 |
39
- | adversarial_trap | 70/70 |
40
- | cascade | 25/25 |
41
- | disambiguation | 40/40 |
42
- | edge_case | 25/25 |
43
- | multi_intent | 20/20 |
44
- | natural_phrasing | 50/50 |
45
- | param_extraction | 25/25 |
46
- | verifier | 25/25 |
47
 
48
- ## Unified Model
49
 
50
- This model replaces both `prism-coder:32b` (routing) and `prism-ide:32b` (code generation). The LoRA fine-tuning only affects 8 of 64 layers, preserving the base model's general coding capability while adding 100% accurate tool routing.
51
 
52
  ## Usage
53
 
54
- ### Ollama
55
-
56
  ```bash
57
  ollama pull dcostenco/prism-coder:32b
58
- # Same model also available as:
59
- ollama pull dcostenco/prism-ide:32b
60
- ```
61
-
62
- ### Modelfile
63
-
64
- ```
65
- FROM prism-coder-32b-q4km.gguf
66
- PARAMETER temperature 0
67
- PARAMETER num_ctx 16384
68
- PARAMETER num_predict 512
69
- PARAMETER stop "<|im_end|>"
70
- PARAMETER stop "<|endoftext|>"
71
  ```
72
 
73
  ## Model Family
74
 
75
- | Model | Size | eval_300 | Context |
76
- |-------|------|----------|---------|
77
- | prism-coder:1b7 | 2.2 GB | 100% | 8K |
78
- | prism-coder:4b | 2.5 GB | 100% | 8K |
79
- | prism-coder:14b | 9.0 GB | 99.7% | 16K |
80
- | **prism-coder:32b** | **18 GB** | **100%** | **16K** |
81
-
82
- ## Training
83
-
84
- - **Base**: Qwen/Qwen3-32B (4-bit quantized for training)
85
- - **Method**: MLX LoRA SFT (rank=16, 8 layers, scale=20.0) x 14 rounds
86
- - **Quantization**: Q4_K_M via llama.cpp (18 GB)
87
- - **Hardware**: Apple M5 Max 48 GB
88
 
89
  ## License
90
 
 
5
  tags:
6
  - tool-calling
7
  - function-calling
 
8
  - prism
9
  - synalux
10
  - memory-augmented
 
14
  pipeline_tag: text-generation
15
  ---
16
 
17
+ # Prism Coder 32B — Tool-Routing Model
18
 
19
+ Fine-tuned Qwen3-32B for routing user requests to the correct Prism Memory tool. 17 tools + NO_TOOL abstention across 9 evaluation categories.
20
 
21
+ ## What this model does
22
+
23
+ Routes natural language requests to the correct Prism Memory tool (session_save_ledger, session_load_context, knowledge_search, etc.). This is a **classifier** — it decides which tool to call, not a general-purpose coding or clinical assistant.
24
+
25
+ ## What this model does NOT do
26
+
27
+ - General code generation (not trained on code)
28
+ - Clinical note writing (not trained on clinical data)
29
+ - Codebase understanding (does not know Synalux internals)
30
+ - General reasoning beyond base Qwen3-32B capability
31
 
32
  ## Performance
33
 
34
+ | Metric | Score | Notes |
35
+ |--------|-------|-------|
36
+ | eval_300 strict (model only) | **292/300 (97.3%)** | Model's raw accuracy |
37
+ | eval_300 strict (with post-processing) | **300/300 (100%)** | 8 cases fixed by validate_tool_call regex layer |
38
+ | 3-seed validation | 300/300 x 3 | With post-processing |
39
+ | avg latency | 1.4s | Apple M5 Max |
40
+ | context window | 16,384 tokens | |
41
+
42
+ The eval harness includes a `validate_tool_call` post-processing layer that remaps 8 edge cases the model gets wrong (e.g., "repair links" → backfill_links, "log a milestone" → save_experience). Without this layer, raw model accuracy is 97.3%.
43
 
44
+ ## Training
45
 
46
+ - **Base**: Qwen/Qwen3-32B (4-bit quantized for training via MLX)
47
+ - **Method**: LoRA SFT (rank=16, 8 of 64 layers, scale=20.0) x 14 iterative rounds
48
+ - **Training data**: eval_300 prompt→tool routing examples only. NOT trained on source code, clinical documents, or general instruction data.
49
+ - **Quantization**: Q4_K_M via llama.cpp (18 GB)
50
+ - **Hardware**: Apple M5 Max 48 GB unified memory
 
 
 
 
 
 
51
 
52
+ ## Upcoming
53
 
54
+ A stacked LoRA adapter (layers 1-16) trained on Synalux codebase, clinical protocols, and Prism Memory internals is in progress. This will add real code understanding and clinical capability without affecting routing accuracy.
55
 
56
  ## Usage
57
 
 
 
58
  ```bash
59
  ollama pull dcostenco/prism-coder:32b
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ```
61
 
62
  ## Model Family
63
 
64
+ | Model | Size | eval_300 (raw) | eval_300 (with post-processing) |
65
+ |-------|------|---------------|-------------------------------|
66
+ | prism-coder:1b7 | 2.2 GB | 100% | 100% |
67
+ | prism-coder:4b | 2.5 GB | 100% | 100% |
68
+ | prism-coder:14b | 9.0 GB | ~97% | 99.7% |
69
+ | **prism-coder:32b** | **18 GB** | **97.3%** | **100%** |
 
 
 
 
 
 
 
70
 
71
  ## License
72