dcostenco commited on
Commit
625b3be
Β·
verified Β·
1 Parent(s): 17886ae

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +59 -45
README.md CHANGED
@@ -4,76 +4,90 @@ license: apache-2.0
4
  tags:
5
  - tool-routing
6
  - function-calling
7
- - prism-aac
8
  - qwen3
9
  - gguf
 
10
  base_model: Qwen/Qwen3-1.7B
11
  ---
12
 
13
- # prism-coder:1.7b β€” Tool Routing Model (Always-Fits Tier)
14
 
15
- Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
- Primary deployment: **any iOS device** via llama.cpp GGUF β€” the guaranteed fallback for all device tiers.
17
 
18
- ## BFCL Routing Benchmark β€” v42 (Current)
19
 
20
- **Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
 
22
  | Category | Count | Description | Accuracy |
23
  |----------|------:|-------------|:--------:|
24
- | aac | 12 | AAC phrase requests β†’ plain text | 100% |
25
- | cmpct | 6 | Ledger compaction | 100% |
26
- | edge | 6 | Multi-step / compound requests | 100% |
27
- | hand | 8 | Agent handoff / relay | 100% |
28
- | info | 5 | General facts β†’ plain text | 100% |
29
- | irrel | 10 | Irrelevant / live queries β†’ plain text | 100% |
30
- | know | 7 | Knowledge base search | 100% |
31
- | load | 9 | Session context loading | 100% |
32
- | pred | 8 | Factual / knowledge queries β†’ plain text | 100% |
33
- | save | 13 | Session ledger save | 100% |
34
- | smem | 12 | Session memory search | 100% |
35
- | tran | 6 | Translation requests β†’ plain text | 100% |
36
-
37
- Eval: MLX inference + thinking, temperature=0, 3-seed mean.
38
- Gate: β‰₯90% = deploy.
39
-
40
- ## Version History
41
-
42
- | Version | BFCL | Notes |
43
- |---------|------|-------|
44
- | v42 | **100.0%** | Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load |
45
- | v41 | 96.1% | Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss |
46
- | v36 | 94.1% | LoRA rank=16, all 28 layers, mask-prompt |
47
- | v19 | ~88% | Baseline 1.7B routing |
48
 
49
  ## Tools
50
 
51
- The model routes to exactly 6 tools:
52
 
53
  | Tool | Trigger |
54
  |------|---------|
55
- | `session_load_context` | Load/resume/pull project context |
56
- | `session_save_ledger` | Note/log/record/remember something |
57
- | `session_save_handoff` | Pass state to next agent/session |
58
- | `session_compact_ledger` | Compact/shrink/prune ledger |
59
- | `session_search_memory` | Recall prior session discussions |
60
- | `knowledge_search` | Search stored knowledge base ("what do I know") |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.
 
 
63
 
64
  ## Model Details
65
 
66
- - **Base**: Qwen/Qwen3-1.7B
67
- - **Format**: GGUF Q4_K_M (~1.2 GB)
68
- - **Context**: 32,768 tokens
69
- - **Training**: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
70
- - **Merge**: direct safetensors merge (scale/rank Γ— B.T @ A.T) β†’ llama.cpp convert β†’ Q4_K_M quantization
71
 
72
  ## Usage
73
 
74
  ```bash
75
  ollama pull dcostenco/prism-coder:1b7
76
- ollama run prism-coder:1b7
77
  ```
78
 
79
- Or in [Prism AAC](https://github.com/dcostenco/prism-aac) β€” the app downloads and loads this model automatically on devices with <8 GB RAM.
 
4
  tags:
5
  - tool-routing
6
  - function-calling
7
+ - prism-coder
8
  - qwen3
9
  - gguf
10
+ - synalux
11
  base_model: Qwen/Qwen3-1.7B
12
  ---
13
 
14
+ # prism-coder:1b7 β€” 17-Tool Memory Agent (Always-Fits Tier)
15
 
16
+ Fine-tuned Qwen3-1.7B for full Prism Memory tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system.
17
+ Primary deployment: **any device** via llama.cpp GGUF β€” the ultra-lightweight tier.
18
 
19
+ ## eval_300 Benchmark β€” swe43 (Current)
20
 
21
+ **300/300 Γ— 3 shuffled runs = 100.0%, 0 flaky**
22
 
23
  | Category | Count | Description | Accuracy |
24
  |----------|------:|-------------|:--------:|
25
+ | natural_phrasing | 50 | Natural language β†’ correct tool | 100% |
26
+ | adversarial_trap | 70 | Coding/CS questions β†’ plain text (no tool) | 100% |
27
+ | disambiguation | 40 | Ambiguous session vs knowledge ops | 100% |
28
+ | edge_case | 25 | Self-description, capability queries β†’ plain text | 100% |
29
+ | verifier | 25 | Verify-then-act chains | 100% |
30
+ | param_extraction | 25 | Extract project/query from prompt | 100% |
31
+ | cascade | 25 | Multi-step tool chains | 100% |
32
+ | multi_intent | 20 | Compound instructions | 100% |
33
+ | abstention | 20 | Greetings, math, creative requests β†’ plain text | 100% |
34
+
35
+ 300 test cases, 3 shuffled runs, temperature=0, 0 hallucinations across all runs.
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ## Tools
38
 
39
+ Routes to 17 Prism Memory tools + knows when NOT to call any tool:
40
 
41
  | Tool | Trigger |
42
  |------|---------|
43
+ | `session_load_context` | Load/resume project context, "starting fresh" |
44
+ | `session_save_ledger` | Log/record completed work |
45
+ | `session_save_handoff` | Create handoff note for next session |
46
+ | `session_search_memory` | Recall prior discussions |
47
+ | `session_forget_memory` | Delete a memory entry |
48
+ | `session_health_check` | Check session system health |
49
+ | `session_compact_ledger` | Compact/prune session ledger |
50
+ | `session_export_memory` | Export session data |
51
+ | `session_task_route` | Route task: local vs cloud |
52
+ | `session_save_experience` | Save a notable experience |
53
+ | `session_synthesize_edges` | Build session graph edges |
54
+ | `session_backfill_links` | Repair dangling session links |
55
+ | `knowledge_search` | Search stored knowledge base |
56
+ | `knowledge_forget` | Remove a knowledge entry |
57
+ | `knowledge_upvote` | Upvote knowledge entry |
58
+ | `knowledge_downvote` | Downvote knowledge entry |
59
+ | `knowledge_set_retention` | Set retention policy |
60
+
61
+ **Abstains (plain text)** for: coding questions, CS concepts, arithmetic, greetings, capability queries, creative requests, general knowledge.
62
+
63
+ ## Version History
64
+
65
+ | Version | eval_300 | Notes |
66
+ |---------|---------|-------|
67
+ | swe43 | **300/300 Γ— 3 runs = 100.0%** | Fresh rank=32 LoRA + `<think>` routing, Q8_0 GGUF |
68
+ | swe30 | 280/300 = 93.3% | Q8_0 first round (fixed Q4KM quantization erasure) |
69
+ | v43l | 203/300 = 67.7% | Baseline before SWE training |
70
+ | v42 | 100% BFCL 6-tool | Previous 6-tool routing model |
71
+
72
+ ## Key Training Insights
73
 
74
+ - **Q8_0 quantization required** β€” Q4KM erased LoRA deltas for soft abstain patterns (87%β†’93% at R30)
75
+ - **Adapter saturation** β€” After 39 cumulative rounds at rank=8, adapter was saturated. Fresh rank=32 on R39-merged base broke plateau in one round (93.3%β†’99.7%)
76
+ - **`<think>` routing blocks** β€” Added CoT reasoning to abstain examples activates Qwen3's pretrained thinking circuit, providing explicit gradient path for the routing decision
77
 
78
  ## Model Details
79
 
80
+ - **Base**: Qwen/Qwen3-1.7B β†’ merged through 43 SWE training rounds
81
+ - **Format**: GGUF Q8_0 (2.2 GB)
82
+ - **Context**: 8,192 tokens
83
+ - **Final adapter**: MLX LoRA rank=32, all 28 layers, LR=3e-6β†’8e-7, 1,267 train rows/round
84
+ - **Total training**: 43 rounds of cumulative SFT + 4 fresh rank=32 rounds
85
 
86
  ## Usage
87
 
88
  ```bash
89
  ollama pull dcostenco/prism-coder:1b7
90
+ ollama run dcostenco/prism-coder:1b7
91
  ```
92
 
93
+ Or via the [Synalux Prism MCP server](https://github.com/dcostenco/prism-mcp) which routes tool calls automatically.