File size: 3,913 Bytes
04be453
 
a586bd2
04be453
6ee010d
 
625b3be
6ee010d
a586bd2
625b3be
8e12d5c
04be453
 
625b3be
04be453
625b3be
 
8e12d5c
625b3be
8e12d5c
625b3be
8e12d5c
 
 
625b3be
 
 
 
 
 
 
 
 
 
 
a82167d
6ee010d
 
625b3be
6ee010d
8e12d5c
 
625b3be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a82167d
625b3be
 
 
a82167d
8e12d5c
a82167d
625b3be
 
 
 
 
a82167d
8e12d5c
6ee010d
 
8e12d5c
625b3be
6ee010d
 
625b3be
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-coder
  - qwen3
  - gguf
  - synalux
base_model: Qwen/Qwen3-1.7B
---

# prism-coder:1b7 β€” 17-Tool Memory Agent (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for full Prism Memory tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system.
Primary deployment: **any device** via llama.cpp GGUF β€” the ultra-lightweight tier.

## eval_300 Benchmark β€” swe43 (Current)

**300/300 Γ— 3 shuffled runs = 100.0%, 0 flaky**

| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| natural_phrasing | 50 | Natural language β†’ correct tool | 100% |
| adversarial_trap | 70 | Coding/CS questions β†’ plain text (no tool) | 100% |
| disambiguation | 40 | Ambiguous session vs knowledge ops | 100% |
| edge_case | 25 | Self-description, capability queries β†’ plain text | 100% |
| verifier | 25 | Verify-then-act chains | 100% |
| param_extraction | 25 | Extract project/query from prompt | 100% |
| cascade | 25 | Multi-step tool chains | 100% |
| multi_intent | 20 | Compound instructions | 100% |
| abstention | 20 | Greetings, math, creative requests β†’ plain text | 100% |

300 test cases, 3 shuffled runs, temperature=0, 0 hallucinations across all runs.

## Tools

Routes to 17 Prism Memory tools + knows when NOT to call any tool:

| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume project context, "starting fresh" |
| `session_save_ledger` | Log/record completed work |
| `session_save_handoff` | Create handoff note for next session |
| `session_search_memory` | Recall prior discussions |
| `session_forget_memory` | Delete a memory entry |
| `session_health_check` | Check session system health |
| `session_compact_ledger` | Compact/prune session ledger |
| `session_export_memory` | Export session data |
| `session_task_route` | Route task: local vs cloud |
| `session_save_experience` | Save a notable experience |
| `session_synthesize_edges` | Build session graph edges |
| `session_backfill_links` | Repair dangling session links |
| `knowledge_search` | Search stored knowledge base |
| `knowledge_forget` | Remove a knowledge entry |
| `knowledge_upvote` | Upvote knowledge entry |
| `knowledge_downvote` | Downvote knowledge entry |
| `knowledge_set_retention` | Set retention policy |

**Abstains (plain text)** for: coding questions, CS concepts, arithmetic, greetings, capability queries, creative requests, general knowledge.

## Version History

| Version | eval_300 | Notes |
|---------|---------|-------|
| swe43 | **300/300 Γ— 3 runs = 100.0%** | Fresh rank=32 LoRA + `<think>` routing, Q8_0 GGUF |
| swe30 | 280/300 = 93.3% | Q8_0 first round (fixed Q4KM quantization erasure) |
| v43l | 203/300 = 67.7% | Baseline before SWE training |
| v42 | 100% BFCL 6-tool | Previous 6-tool routing model |

## Key Training Insights

- **Q8_0 quantization required** β€” Q4KM erased LoRA deltas for soft abstain patterns (87%β†’93% at R30)
- **Adapter saturation** β€” After 39 cumulative rounds at rank=8, adapter was saturated. Fresh rank=32 on R39-merged base broke plateau in one round (93.3%β†’99.7%)  
- **`<think>` routing blocks** β€” Added CoT reasoning to abstain examples activates Qwen3's pretrained thinking circuit, providing explicit gradient path for the routing decision

## Model Details

- **Base**: Qwen/Qwen3-1.7B β†’ merged through 43 SWE training rounds
- **Format**: GGUF Q8_0 (2.2 GB)
- **Context**: 8,192 tokens
- **Final adapter**: MLX LoRA rank=32, all 28 layers, LR=3e-6β†’8e-7, 1,267 train rows/round
- **Total training**: 43 rounds of cumulative SFT + 4 fresh rank=32 rounds

## Usage

```bash
ollama pull dcostenco/prism-coder:1b7
ollama run dcostenco/prism-coder:1b7
```

Or via the [Synalux Prism MCP server](https://github.com/dcostenco/prism-mcp) which routes tool calls automatically.