dcostenco commited on
Commit
e445ce1
·
verified ·
1 Parent(s): beb32da

Update model card: v36 100.0% PERFECT — smem fix

Browse files
Files changed (1) hide show
  1. README.md +43 -48
README.md CHANGED
@@ -15,71 +15,66 @@ base_model: Qwen/Qwen3-8B
15
  Fine-tuned Qwen3-8B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
  Primary deployment: **iOS and edge devices** via llama.cpp GGUF.
17
 
18
- ## BFCL Routing Benchmark — v35 (Current)
19
-
20
- **Mean: 98.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
-
22
- | Category | Description | Accuracy |
23
- |----------|-------------|:--------:|
24
- | aac | AAC phrase requests → plain text | 100% |
25
- | cmpct | Ledger compaction | 100% |
26
- | edge | Multi-step / compound requests | 100% |
27
- | hand | Agent handoff / relay | 100% |
28
- | info | General facts → plain text | 100% |
29
- | irrel | Irrelevant / live queries → plain text | 100% |
30
- | know | Knowledge base search | 100% |
31
- | load | Session context loading | 100% |
32
- | pred | Factual / knowledge queries → plain text | 100% |
33
- | save | Session ledger save | 100% |
34
- | smem | Session memory search | 83% |
35
- | tran | Translation requests → plain text | 100% |
36
-
37
- Eval: Ollama inference, temperature=0, Qwen3 thinking suppressed (`<think>\n\n</think>`), num_predict=160.
38
  Gate: ≥90% = deploy.
39
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
43
  |---------|------|-------|
 
44
  | v35 | 98.0% | Proper safetensors merge — fixes mlx_lm.fuse LoRA loss |
45
  | v32 | 98.0% | Routing corpus v32_8b, direct safetensors merge |
46
- | v31 | 95.1% | Surgical smem/know boundary fixes |
47
- | v30 | 95.0% | Routing corpus v36_1b7 |
48
 
49
  ## Tools
50
 
51
- The model routes between exactly 6 tools:
52
 
53
- 1. `session_load_context` load/fetch/resume project context
54
- 2. `session_save_ledger` — note/log/remember/record progress
55
- 3. `session_save_handoff` handoff/relay to next agent/session
56
- 4. `session_compact_ledger` compact/archive/shrink ledger
57
- 5. `session_search_memory` recall past sessions/conversations
58
- 6. `knowledge_search` search stored notes/knowledge base
 
 
59
 
60
- ## Files
61
 
62
- | File | Size | Use |
63
- |------|------|-----|
64
- | `qwen3-8b-v35-q4km.gguf` | 4.7 GB | Ollama / desktop |
65
- | `prism-aac-8b-q4km.gguf` | 4.7 GB | iOS app download |
66
 
67
- ## Cascade Role
 
 
 
 
68
 
69
- Edge / iOS tier. Desktop cascade: **8B → 14B → 32B → cloud Claude**.
70
- 8B handles offline/low-RAM scenarios (< 6 GB available).
71
-
72
- ## Usage (Ollama)
73
 
74
  ```bash
75
- ollama run dcostenco/prism-coder:8b
 
76
  ```
77
 
78
- ## Training
79
-
80
- - **Base**: `Qwen/Qwen3-8B` (fp16, 8.2B params)
81
- - **Framework**: MLX-LM LoRA (rank=8, scale=20, 4 layers)
82
- - **Data**: v32_8b corpus (788 train, 44 valid, text format)
83
- - **Hyperparams**: LR=5e-5, 400 iters, seq=1024
84
- - **Merge**: Direct safetensors manipulation (delta = scale/rank × B^T A^T)
85
- - **Peak memory**: 18 GB (M-series Mac)
 
15
  Fine-tuned Qwen3-8B for 6-tool routing in the [Prism AAC](https://github.com/dcostenco/prism-aac) system.
16
  Primary deployment: **iOS and edge devices** via llama.cpp GGUF.
17
 
18
+ ## BFCL Routing Benchmark — v36 (Current)
19
+
20
+ **Mean: 100.0%** (3-seed average, seeds 2027/2028/2029, 102 cases each)
21
+
22
+ | Category | Count | Description | Accuracy |
23
+ |----------|------:|-------------|:--------:|
24
+ | aac | 12 | AAC phrase requests → plain text | 100% |
25
+ | cmpct | 6 | Ledger compaction | 100% |
26
+ | edge | 6 | Multi-step / compound requests | 100% |
27
+ | hand | 8 | Agent handoff / relay | 100% |
28
+ | info | 5 | General facts → plain text | 100% |
29
+ | irrel | 10 | Irrelevant / live queries → plain text | 100% |
30
+ | know | 7 | Knowledge base search | 100% |
31
+ | load | 9 | Session context loading | 100% |
32
+ | pred | 8 | Factual / knowledge queries → plain text | 100% |
33
+ | save | 13 | Session ledger save | 100% |
34
+ | smem | 12 | Session memory search | 100% |
35
+ | tran | 6 | Translation requests → plain text | 100% |
36
+
37
+ Eval: MLX inference + thinking, temperature=0, 3-seed mean.
38
  Gate: ≥90% = deploy.
39
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
43
  |---------|------|-------|
44
+ | v36 | **100.0%** | Fixed: smem "BFCL v4 notes" and "training loss" → session_search_memory |
45
  | v35 | 98.0% | Proper safetensors merge — fixes mlx_lm.fuse LoRA loss |
46
  | v32 | 98.0% | Routing corpus v32_8b, direct safetensors merge |
47
+ | v31 | 95.1% | Surgical smem/know boundary fix |
48
+ | v30 | ~93% | Baseline 8B routing |
49
 
50
  ## Tools
51
 
52
+ The model routes to exactly 6 tools:
53
 
54
+ | Tool | Trigger |
55
+ |------|---------|
56
+ | `session_load_context` | Load/resume project context |
57
+ | `session_save_ledger` | Note/log/record/remember something |
58
+ | `session_save_handoff` | Pass state to next agent/session |
59
+ | `session_compact_ledger` | Shrink/prune ledger (no relay) |
60
+ | `session_search_memory` | Recall prior session discussions |
61
+ | `knowledge_search` | Search stored knowledge base |
62
 
63
+ Plain text (no tool) for: AAC phrases, translations, weather, general facts, code, math.
64
 
65
+ ## Model Details
 
 
 
66
 
67
+ - **Base**: Qwen/Qwen3-8B
68
+ - **Format**: GGUF Q4_K_M (~4.9 GB)
69
+ - **Context**: 32,768 tokens
70
+ - **Training**: MLX LoRA, rank=16, 16 layers, 1000 iters, LR=2e-6, v36 corpus (806 examples)
71
+ - **Merge**: mlx_lm.fuse → llama.cpp convert → Q4_K_M quantization
72
 
73
+ ## Usage
 
 
 
74
 
75
  ```bash
76
+ ollama pull dcostenco/prism-coder-8b
77
+ ollama run prism-coder:8b
78
  ```
79
 
80
+ Or in the [Prism Coder IDE](https://github.com/dcostenco/prism-aac) — set model to `prism-coder:8b` in Settings.