sameer-saraf-quant-ai commited on
Commit
d7cca15
·
verified ·
1 Parent(s): a9ee43a

v8: Stage 2 Context-Contract Planning (MLX LoRA, iter 1000, val=0.032)

Browse files
0000200_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67322f026246e071e283f1df0a52731c457a3b23f573868fb6be965fef6d1713
3
+ size 161523781
0000400_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0af972dab5d6f496b81f8706ba1ac3c1131d3130fc578c9949322e72383633e
3
+ size 161523781
0000600_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fceb28c77779930aed540ed89653261fcae365f4c9ebc784a295043c125837d
3
+ size 161523781
0000800_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c67b0f6a1ca8ffeb76a1d8e22ae842999b0fef063433289b59185d8a57213cb1
3
+ size 161523781
0001000_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65fe2fd597efdf2d7124923a35dd1489254889e870ae543ed265d264c896967a
3
+ size 161523781
0001200_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddc136f59c37c531b2f1f9e7ea49ccc20ffcf865e4216f5df04567091aa388bf
3
+ size 161523781
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-7B-Instruct
4
+ tags:
5
+ - workflow-planner
6
+ - slm
7
+ - lora
8
+ - mlx
9
+ - context-contract-planning
10
+ - tier-2-eligibility
11
+ language:
12
+ - en
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # SLM Workflow Planner v8 — Context-Contract Planning (MLX LoRA)
17
+
18
+ ## Overview
19
+
20
+ **v8** is a Stage 2 enhancement of the SLM Workflow Planner. It extends the v3-best checkpoint with **context-contract planning** — the ability to make routing decisions based on the `required_context` and `produces_context` of ALL nodes in a workflow graph, not just directly connected edges.
21
+
22
+ This enables three new capabilities:
23
+ - **Recovery Routing (Backjump):** On failure, jump backward to an earlier context-satisfiable node
24
+ - **Stage Skipping:** Skip unnecessary stages when required context is already available (e.g., walk-in customers)
25
+ - **Non-Adjacent Parallelism:** Fork two independent context-satisfiable nodes that aren't connected by fork-edges
26
+
27
+ ## Model Details
28
+
29
+ | Property | Value |
30
+ |---|---|
31
+ | **Base Model** | Qwen/Qwen2.5-7B-Instruct |
32
+ | **Fine-tune Type** | LoRA (MLX format) |
33
+ | **LoRA Rank** | 16 |
34
+ | **LoRA Scale** | 2.0 |
35
+ | **LoRA Dropout** | 0.02 |
36
+ | **Tuned Layers** | 28/32 |
37
+ | **Trainable Parameters** | 40.37M (0.53%) |
38
+ | **Framework** | MLX (Apple Silicon) |
39
+
40
+ ## Training
41
+
42
+ | Property | Value |
43
+ |---|---|
44
+ | **Lineage** | base(8000) → v2(100) → v3(200) → v3-cont → v3-best → **v8(1000)** |
45
+ | **Resume Checkpoint** | v3-best (59.2% on 76-scenario suite) |
46
+ | **Training Iterations** | 1000 (stopped early — val loss converged) |
47
+ | **Learning Rate** | 2e-5 (cosine decay to 1e-6, 100-step warmup) |
48
+ | **Batch Size** | 4 (effective 8 with grad accumulation) |
49
+ | **Max Sequence Length** | 768 tokens |
50
+ | **Dataset** | 696K samples from 150 workflows |
51
+ | **Val Loss** | 0.032 (from 0.272 starting) |
52
+
53
+ ### Training Data Distribution
54
+
55
+ | Category | Count | % | Description |
56
+ |---|---|---|---|
57
+ | META | 187K | 26.9% | Dead-end escalation |
58
+ | NEGATIVE | 187K | 26.9% | Tier-2 visible but edge chosen ("satisfiable ≠ sensible") |
59
+ | NEXT_EDGE | 116K | 16.7% | Normal edge progression |
60
+ | NEXT_SKIP 🛡 | 55K | 8.0% | Forward dead-end recovery (Tier-2) |
61
+ | RETRY | 36K | 5.2% | Edge retry on failure |
62
+ | JOIN | 30K | 4.3% | Parallel branch merge |
63
+ | NEXT_BACKJUMP 🛡 | 28K | 4.0% | Failure recovery to earlier node (Tier-2) |
64
+ | FORK_EDGE | 28K | 4.0% | Edge-adjacent fork |
65
+ | FORK_NONADJ 🛡 | 28K | 4.0% | Non-adjacent parallel fork (Tier-2) |
66
+
67
+ 🛡 = Protected from downsampling during balancing
68
+
69
+ ## Prompt Format
70
+
71
+ The model uses a **tiered prompt** with two candidate sections:
72
+
73
+ ```
74
+ Current node: NODE_A (SYSTEM, stage 3)
75
+ Outcome: success
76
+ Failure type: none
77
+
78
+ State:
79
+ goal_progress=0.40
80
+ retry_count=0
81
+ ...
82
+
83
+ Produced context: {ctx_start, intake_data, assessment_score}
84
+
85
+ Edge candidates (normal path):
86
+ 1. NODE_B (AGENT) [processor] → requires: {assessment_score} → produces: {approval}
87
+
88
+ Context-eligible (off-path, invocable now):
89
+ 1. NODE_X (SYSTEM, stage 5, gap=+2) [validator] → requires: {intake_data} ✓ → produces: {validation}
90
+
91
+ Forkable sets: []
92
+ Join-ready: []
93
+
94
+ What is the best action?
95
+ ```
96
+
97
+ **Output format:** `DECISION_TYPE NODE_ID`
98
+ - `NEXT NODE_B` — advance to NODE_B
99
+ - `FORK NODE_A, NODE_B` — parallel fork
100
+ - `RETRY NODE_A` — retry current
101
+ - `JOIN NODE_A` — merge parallel branches
102
+ - `META` — escalate to human
103
+
104
+ ## Evaluation Results
105
+
106
+ ### Section A: Stratified Test (100 held-out samples)
107
+
108
+ | Category | Exact Accuracy | Type Accuracy |
109
+ |---|---|---|
110
+ | META | 20/20 (100%) | 20/20 (100%) |
111
+ | NEGATIVE (Tier-2 visible, edge chosen) | 5/5 (100%) | 5/5 (100%) |
112
+ | SKIP_FORWARD | 7/7 (100%) | 7/7 (100%) |
113
+ | RETRY | 18/20 (90%) | 18/20 (90%) |
114
+ | JOIN | 16/20 (80%) | 16/20 (80%) |
115
+ | FORK (non-adjacent) | 12/18 (67%) | 14/18 (78%) |
116
+ | NEXT (edge) | 5/8 (63%) | 8/8 (100%) |
117
+ | **TOTAL** | **83/100 (83%)** | **88/100 (88%)** |
118
+
119
+ ### Section B: Tier-2 Specific (90 held-out samples)
120
+
121
+ | Category | Exact Accuracy | Type Accuracy |
122
+ |---|---|---|
123
+ | Non-Adjacent Fork | 15/15 (100%) | 15/15 (100%) |
124
+ | META with Context | 15/15 (100%) | 15/15 (100%) |
125
+ | Negative Contrast | 14/15 (93%) | 14/15 (93%) |
126
+ | RETRY with Context | 14/15 (93%) | 14/15 (93%) |
127
+ | Skip Forward | 13/15 (87%) | 14/15 (93%) |
128
+ | JOIN with Context | 10/15 (67%) | 10/15 (67%) |
129
+ | **TOTAL** | **81/90 (90%)** | **82/90 (91%)** |
130
+
131
+ ## Key Capabilities
132
+
133
+ 1. **Context-Contract Reasoning:** Evaluates `required_context ⊆ produced_keys` to identify all invocable nodes
134
+ 2. **Recovery Routing:** Backjumps on process/resource failure when no edge retry exists
135
+ 3. **Stage Skipping:** Advances to forward context-eligible nodes at dead-ends
136
+ 4. **Non-Adjacent Parallelism:** Forks independent context-eligible nodes with different actors
137
+ 5. **Negative Contrast:** Learned "satisfiable ≠ sensible" — doesn't take Tier-2 when edge path is correct
138
+
139
+ ## Usage (MLX)
140
+
141
+ ```python
142
+ from mlx_lm import load, generate
143
+
144
+ model, tokenizer = load(
145
+ "Qwen/Qwen2.5-7B-Instruct",
146
+ adapter_path="sameer-saraf-quant-ai/slm-workflow-planner-v8-mlx"
147
+ )
148
+
149
+ messages = [
150
+ {"role": "system", "content": "You are a workflow planner..."},
151
+ {"role": "user", "content": "<tiered prompt>"},
152
+ ]
153
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
154
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=30)
155
+ print(response) # "NEXT ESTIMATION_AND_APPROVAL"
156
+ ```
157
+
158
+ ## Ensemble Recommendation
159
+
160
+ For production use, combine with GPT-4.1 arbiter for the ~10% edge cases (mainly JOIN confusion):
161
+ - v8 handles 90%+ of decisions autonomously
162
+ - GPT validates uncertain decisions (estimated 5-10% of traffic)
163
+
164
+ ## Architecture Context
165
+
166
+ This adapter is part of the **Agentic OS** system:
167
+ - **Temporal** handles durable execution and state management
168
+ - **Neo4j** stores workflow graph definitions
169
+ - **SLM (this model)** makes real-time routing decisions
170
+ - **Guardrails** validate SLM output before execution
171
+
adapter_config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adapter_path": "src_slm/training/adapters_7b_v8",
3
+ "batch_size": 4,
4
+ "config": "src_slm/training/lora_config_v8.yaml",
5
+ "data": "src_slm/training/data",
6
+ "fine_tune_type": "lora",
7
+ "grad_accumulation_steps": 2,
8
+ "grad_checkpoint": true,
9
+ "iters": 2000,
10
+ "learning_rate": 2e-05,
11
+ "lora_parameters": {
12
+ "rank": 16,
13
+ "dropout": 0.02,
14
+ "scale": 2.0
15
+ },
16
+ "lr_schedule": {
17
+ "name": "cosine_decay",
18
+ "arguments": [
19
+ 2e-05,
20
+ 2000,
21
+ 1e-06
22
+ ],
23
+ "warmup": 100,
24
+ "warmup_init": 0.0
25
+ },
26
+ "mask_prompt": true,
27
+ "max_seq_length": 768,
28
+ "model": "Qwen/Qwen2.5-7B-Instruct",
29
+ "num_layers": 28,
30
+ "optimizer": "adam",
31
+ "optimizer_config": {
32
+ "adam": {},
33
+ "adamw": {},
34
+ "muon": {},
35
+ "sgd": {},
36
+ "adafactor": {}
37
+ },
38
+ "project_name": null,
39
+ "report_to": null,
40
+ "resume_adapter_file": "src_slm/training/adapters_7b_v3_best/adapters.safetensors",
41
+ "save_every": 200,
42
+ "seed": 42,
43
+ "steps_per_eval": 100,
44
+ "steps_per_report": 20,
45
+ "test": true,
46
+ "test_batches": 200,
47
+ "train": true,
48
+ "val_batches": 100
49
+ }
adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65fe2fd597efdf2d7124923a35dd1489254889e870ae543ed265d264c896967a
3
+ size 161523781