dcostenco commited on
Commit
8e12d5c
Β·
verified Β·
1 Parent(s): 6ee010d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -53
README.md CHANGED
@@ -1,84 +1,79 @@
1
  ---
2
  language: en
3
  license: apache-2.0
4
- base_model: Qwen/Qwen3-1.7B
5
  tags:
6
  - tool-routing
7
  - function-calling
8
  - prism-aac
9
  - qwen3
10
  - gguf
 
11
  ---
12
 
13
- # prism-coder:1b7 β€” Tool Routing Model (Ultra-Compact / iOS Tier)
14
 
15
  Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
- Primary deployment: **on-device iOS inference** via llama.cpp (1.1 GB GGUF, Q4_K_M).
17
-
18
- ## BFCL Routing Benchmark β€” v41 (Current)
19
-
20
- **Mean: 96.1%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
-
22
- | Category | Description | Accuracy |
23
- |----------|-------------|:--------:|
24
- | aac | AAC phrase requests β†’ plain text | 100% |
25
- | cmpct | Ledger compaction | 83% |
26
- | edge | Multi-step / compound requests | 83% |
27
- | hand | Agent handoff / relay | 100% |
28
- | info | General facts β†’ plain text | 100% |
29
- | irrel | Irrelevant / live queries β†’ plain text | 90% |
30
- | know | Knowledge base search | 100% |
31
- | load | Session context loading | 89% |
32
- | pred | Factual / knowledge queries β†’ plain text | 100% |
33
- | save | Session ledger save | 100% |
34
- | smem | Session memory search | 100% |
35
- | tran | Translation requests β†’ plain text | 100% |
36
-
37
- Eval: Ollama inference, temperature=0, Qwen3 thinking suppressed (`<think>\n\n</think>`), num_predict=160.
38
  Gate: β‰₯90% = deploy.
39
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
43
  |---------|------|-------|
44
- | v41 | 96.1% | Current β€” routing corpus v41 |
45
- | v40 | ~95% | Routing corpus v40 |
46
- | v39 | ~94% | Routing corpus v39 |
47
- | v36 | 100% | Previous β€” routing corpus v36 (small eval set) |
48
 
49
  ## Tools
50
 
51
- The model routes between exactly 6 tools:
52
 
53
- 1. `session_load_context` β€” load/fetch/resume project context
54
- 2. `session_save_ledger` β€” note/log/remember/record progress
55
- 3. `session_save_handoff` β€” handoff/relay to next agent/session
56
- 4. `session_compact_ledger` β€” compact/archive/shrink ledger
57
- 5. `session_search_memory` β€” recall past sessions/conversations
58
- 6. `knowledge_search` β€” search stored notes/knowledge base
 
 
59
 
60
- ## Files
61
 
62
- | File | Size | Use |
63
- |------|------|-----|
64
- | `prism-coder-1b7-v41-q4km.gguf` | 1.1 GB | Ollama / desktop |
65
- | `prism-aac-1b7-q4km.gguf` | 1.1 GB | iOS app download |
66
 
67
- ## Cascade Role
 
 
 
 
68
 
69
- Ultra-compact iOS/edge tier. Desktop cascade: **1.7B β†’ 8B β†’ 14B β†’ 32B β†’ cloud Claude**.
70
- 1.7B handles offline on-device routing where memory is tightly constrained (< 2 GB available).
71
-
72
- ## Usage (Ollama)
73
 
74
  ```bash
75
- ollama run dcostenco/prism-coder:1b7
 
76
  ```
77
 
78
- ## Training
79
-
80
- - **Base**: `Qwen/Qwen3-1.7B` (fp16, 1.7B params)
81
- - **Framework**: MLX-LM LoRA (rank=8, scale=20, 4 layers)
82
- - **Data**: v41 routing corpus
83
- - **Merge**: Direct safetensors manipulation (delta = scale/rank Γ— B^T A^T)
84
- - **Peak memory**: ~4 GB (M-series Mac)
 
1
  ---
2
  language: en
3
  license: apache-2.0
 
4
  tags:
5
  - tool-routing
6
  - function-calling
7
  - prism-aac
8
  - qwen3
9
  - gguf
10
+ base_model: Qwen/Qwen3-1.7B
11
  ---
12
 
13
+ # prism-coder:1.7b β€” Tool Routing Model (Always-Fits Tier)
14
 
15
  Fine-tuned Qwen3-1.7B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
+ Primary deployment: **any iOS device** via llama.cpp GGUF β€” the guaranteed fallback for all device tiers.
17
+
18
+ ## BFCL Routing Benchmark β€” v42 (Current)
19
+
20
+ **Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
+
22
+ | Category | Count | Description | Accuracy |
23
+ |----------|------:|-------------|:--------:|
24
+ | aac | 12 | AAC phrase requests β†’ plain text | 100% |
25
+ | cmpct | 6 | Ledger compaction | 100% |
26
+ | edge | 6 | Multi-step / compound requests | 100% |
27
+ | hand | 8 | Agent handoff / relay | 100% |
28
+ | info | 5 | General facts β†’ plain text | 100% |
29
+ | irrel | 10 | Irrelevant / live queries β†’ plain text | 100% |
30
+ | know | 7 | Knowledge base search | 100% |
31
+ | load | 9 | Session context loading | 100% |
32
+ | pred | 8 | Factual / knowledge queries β†’ plain text | 100% |
33
+ | save | 13 | Session ledger save | 100% |
34
+ | smem | 12 | Session memory search | 100% |
35
+ | tran | 6 | Translation requests β†’ plain text | 100% |
36
+
37
+ Eval: MLX inference + thinking, temperature=0, 3-seed mean.
38
  Gate: β‰₯90% = deploy.
39
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
43
  |---------|------|-------|
44
+ | v42 | **100.0%** | Fixed 4 deterministic failures: cmpct tool name, compound edge, write-code irrel, pull-context load |
45
+ | v41 | 96.1% | Proper safetensors merge β€” fixes mlx_lm.fuse LoRA loss |
46
+ | v36 | 94.1% | LoRA rank=16, all 28 layers, mask-prompt |
47
+ | v19 | ~88% | Baseline 1.7B routing |
48
 
49
  ## Tools
50
 
51
+ The model routes to exactly 6 tools:
52
 
53
+ | Tool | Trigger |
54
+ |------|---------|
55
+ | `session_load_context` | Load/resume/pull project context |
56
+ | `session_save_ledger` | Note/log/record/remember something |
57
+ | `session_save_handoff` | Pass state to next agent/session |
58
+ | `session_compact_ledger` | Compact/shrink/prune ledger |
59
+ | `session_search_memory` | Recall prior session discussions |
60
+ | `knowledge_search` | Search stored knowledge base ("what do I know") |
61
 
62
+ Plain text (no tool) for: AAC phrases, translations, weather, general facts, code/regex/functions, math.
63
 
64
+ ## Model Details
 
 
 
65
 
66
+ - **Base**: Qwen/Qwen3-1.7B
67
+ - **Format**: GGUF Q4_K_M (~1.2 GB)
68
+ - **Context**: 32,768 tokens
69
+ - **Training**: MLX LoRA, rank=16, all 28 layers, 800 iters, LR=5e-5, v42 corpus (1028 train / 79 valid)
70
+ - **Merge**: direct safetensors merge (scale/rank Γ— B.T @ A.T) β†’ llama.cpp convert β†’ Q4_K_M quantization
71
 
72
+ ## Usage
 
 
 
73
 
74
  ```bash
75
+ ollama pull dcostenco/prism-coder:1b7
76
+ ollama run prism-coder:1b7
77
  ```
78
 
79
+ Or in [Prism AAC](https://github.com/dcostenco/prism-aac) β€” the app downloads and loads this model automatically on devices with <8 GB RAM.