File size: 2,676 Bytes
a7b4011
6b0fc59
4eb22d2
 
a7b4011
6b0fc59
4eb22d2
 
 
 
 
 
 
 
a7b4011
 
9d3d252
46e84ae
9d3d252
46e84ae
9d3d252
 
 
 
 
 
 
 
 
 
46e84ae
4eb22d2
e7a302c
9d3d252
 
 
 
 
 
 
 
 
e7a302c
9d3d252
e7a302c
9d3d252
 
 
 
 
e7a302c
9d3d252
be2ee28
9d3d252
58778b6
4eb22d2
be2ee28
4eb22d2
 
 
be2ee28
4eb22d2
3552f76
9d3d252
 
 
 
 
 
be2ee28
4eb22d2
3552f76
4eb22d2
3552f76
4eb22d2
a7b4011
9388767
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: apache-2.0
language:
  - en
tags:
  - tool-calling
  - function-calling
  - prism
  - synalux
  - memory-augmented
  - LoRA
  - Q4_K_M
base_model: Qwen/Qwen3-32B
pipeline_tag: text-generation
---

# Prism Coder 32B — Tool-Routing Model

Fine-tuned Qwen3-32B for routing user requests to the correct Prism Memory tool. 17 tools + NO_TOOL abstention across 9 evaluation categories.

## What this model does

Routes natural language requests to the correct Prism Memory tool (session_save_ledger, session_load_context, knowledge_search, etc.). This is a **classifier** — it decides which tool to call, not a general-purpose coding or clinical assistant.

## What this model does NOT do

- General code generation (not trained on code)
- Clinical note writing (not trained on clinical data)
- Codebase understanding (does not know Synalux internals)
- General reasoning beyond base Qwen3-32B capability

## Performance

| Metric | Score | Notes |
|--------|-------|-------|
| eval_300 strict (model only) | **292/300 (97.3%)** | Model's raw accuracy |
| eval_300 strict (with post-processing) | **300/300 (100%)** | 8 cases fixed by validate_tool_call regex layer |
| 3-seed validation | 300/300 x 3 | With post-processing |
| avg latency | 1.4s | Apple M5 Max |
| context window | 16,384 tokens | |

The eval harness includes a `validate_tool_call` post-processing layer that remaps 8 edge cases the model gets wrong (e.g., "repair links" → backfill_links, "log a milestone" → save_experience). Without this layer, raw model accuracy is 97.3%.

## Training

- **Base**: Qwen/Qwen3-32B (4-bit quantized for training via MLX)
- **Method**: LoRA SFT (rank=16, 8 of 64 layers, scale=20.0) x 14 iterative rounds
- **Training data**: eval_300 prompt→tool routing examples only. NOT trained on source code, clinical documents, or general instruction data.
- **Quantization**: Q4_K_M via llama.cpp (18 GB)
- **Hardware**: Apple M5 Max 48 GB unified memory

## Upcoming

A stacked LoRA adapter (layers 1-16) trained on Synalux codebase, clinical protocols, and Prism Memory internals is in progress. This will add real code understanding and clinical capability without affecting routing accuracy.

## Usage

```bash
ollama pull dcostenco/prism-coder:32b
```

## Model Family

| Model | Size | eval_300 (raw) | eval_300 (with post-processing) |
|-------|------|---------------|-------------------------------|
| prism-coder:1b7 | 2.2 GB | 100% | 100% |
| prism-coder:4b | 2.5 GB | 100% | 100% |
| prism-coder:14b | 9.0 GB | ~97% | 99.7% |
| **prism-coder:32b** | **18 GB** | **97.3%** | **100%** |

## License

Apache 2.0

## Author

[Synalux](https://synalux.com)