ssaraf1 commited on
Commit
0cf6105
·
verified ·
1 Parent(s): f4ce949

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +156 -0
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-7B-Instruct
4
+ tags:
5
+ - workflow-planning
6
+ - slm
7
+ - lora
8
+ - mlx
9
+ - apple-silicon
10
+ - policy-learning
11
+ - qwen2
12
+ - text-classification
13
+ - contrastive-alignment
14
+ - meta-strengthening
15
+ - ensemble
16
+ library_name: mlx
17
+ pipeline_tag: text-generation
18
+ language:
19
+ - en
20
+ datasets:
21
+ - ssaraf1/slm-workflow-planner-policy-v2
22
+ - ssaraf1/slm-workflow-planner-alignment-v2
23
+ ---
24
+
25
+ # SLM Workflow Planner 7B v7 — META-Strengthened Alignment (Iter 110)
26
+
27
+ ## Model Description
28
+
29
+ LoRA adapter for **Qwen/Qwen2.5-7B-Instruct** fine-tuned as a **workflow execution planner**.
30
+ This is the **v7 model** — specialized for JOIN and META detection, trained through 7 stages
31
+ of progressive alignment from the base policy checkpoint.
32
+
33
+ ### Training Lineage
34
+
35
+ 1. **Stage A**: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
36
+ 2. **Stage B**: Contrastive alignment on 20K samples (iter 100) → v2
37
+ 3. **Stage C**: Fork-suppression alignment on 4.6K samples (iter 200) → v3-best
38
+ 4. **Stage D**: Signal-overlap restoration on 4K samples (80 iters) → v6-100
39
+ 5. **Stage E**: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) → **v7-110**
40
+
41
+ ### Decision Types
42
+
43
+ | Decision | Description |
44
+ |----------|-------------|
45
+ | **NEXT** | Proceed to the next sequential step |
46
+ | **RETRY** | Retry the current step (within budget) |
47
+ | **FORK** | Launch parallel execution branches |
48
+ | **JOIN** | Synchronize parallel branches |
49
+ | **META** | Escalate — anomaly detected, human intervention needed |
50
+
51
+ ## Performance (76-scenario evaluation suite)
52
+
53
+ ### v7-110 Standalone
54
+
55
+ | Category | **v7-110** | v3-best | GPT-4.1 |
56
+ |----------|-----------|---------|---------|
57
+ | **NEXT** | 17/22 (77%) | 12/22 (55%) | 6/22 (27%) |
58
+ | **RETRY** | 0/12 (0%) | 12/12 (100%) | 11/12 (92%) |
59
+ | **FORK** | 1/14 (7%) | 14/14 (100%) | 14/14 (100%) |
60
+ | **JOIN** | 14/15 (93%) | 15/15 (100%) | 10/15 (67%) |
61
+ | **META** | 10/13 (77%) | 0/13 (0%) | 0/13 (0%) |
62
+ | **TOTAL** | 42/76 (55.3%) | 53/76 (69.7%) | 41/76 (53.9%) |
63
+
64
+ ### Key Strengths
65
+ - 🔥 **META: 77%** — only model that detects anomalies (all others at 0%)
66
+ - 🔥 **JOIN: 93%** — near-perfect synchronization detection
67
+ - 🔥 **NEXT: 77%** — strong sequential progression
68
+
69
+ ### Ensemble with v3-best (Vote)
70
+ When combined with v3-best in a vote ensemble:
71
+ - v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
72
+ - v7-110 covers META (77%), NEXT (77%)
73
+ - **Combined: significantly better than any individual model**
74
+
75
+ ## v7 Training Dataset Design
76
+
77
+ The v7 dataset was specifically designed to address:
78
+
79
+ 1. **META manifold strengthening** — quality-filtered META samples with anomaly outcomes
80
+ 2. **Synthetic anomaly patterns** — matching evaluation suite scenarios
81
+ 3. **Risk-weighted allocation** — more samples for high-risk misclassifications (RETRY→NEXT, JOIN→NEXT)
82
+ 4. **Rehearsal for all classes** — broad sampling to prevent catastrophic forgetting
83
+
84
+ ### Dataset Distribution
85
+ | Category | Samples | % |
86
+ |----------|---------|---|
87
+ | META | 904 | 28.2% |
88
+ | NEXT | 800 | 25.0% |
89
+ | RETRY | 500 | 15.6% |
90
+ | FORK | 500 | 15.6% |
91
+ | JOIN | 500 | 15.6% |
92
+ | **Total** | **3,204** | 100% |
93
+
94
+ ## LoRA Configuration
95
+
96
+ | Parameter | Value |
97
+ |-----------|-------|
98
+ | Rank | 16 |
99
+ | Alpha (scale) | 32 (2.0x) |
100
+ | Dropout | 0.02 |
101
+ | Target layers | Last 28 of 32 |
102
+ | Target modules | q_proj, k_proj, v_proj, o_proj |
103
+
104
+ ## Training Configuration
105
+
106
+ | Parameter | Value |
107
+ |-----------|-------|
108
+ | Framework | MLX (Apple Silicon native) |
109
+ | Hardware | Apple M4 Pro, 48GB unified memory |
110
+ | Iterations | 110 (best val loss) |
111
+ | Batch size | 4 |
112
+ | Learning rate | 1e-5 |
113
+ | Sequence length | 512 |
114
+ | Prompt masking | Yes |
115
+ | Resume from | v6-100 checkpoint |
116
+
117
+ ## Usage
118
+
119
+ ### With MLX (Apple Silicon)
120
+
121
+ ```python
122
+ from mlx_lm import load, generate
123
+ from mlx_lm.sample_utils import make_sampler
124
+
125
+ model, tokenizer = load(
126
+ "Qwen/Qwen2.5-7B-Instruct",
127
+ adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
128
+ )
129
+
130
+ messages = [
131
+ {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
132
+ {"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
133
+ ]
134
+
135
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
136
+ sampler = make_sampler(temp=0.0)
137
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
138
+ print(response) # Expected: META (high uncertainty + anomaly)
139
+ ```
140
+
141
+ ## Recommended: Ensemble with v3-best
142
+
143
+ For production use, combine v7-110 with v3-best in a vote ensemble:
144
+ - Use v3-best (`ssaraf1/slm-workflow-planner-7b-v3`) for RETRY/FORK/JOIN decisions
145
+ - Use v7-110 for META/NEXT decisions
146
+ - Confidence-weighted voting resolves disagreements
147
+
148
+ ## Files
149
+
150
+ - `adapters.safetensors` — LoRA adapter weights (v7-110 checkpoint)
151
+ - `adapter_config.json` — LoRA configuration for MLX
152
+
153
+ ## Citation
154
+
155
+ Part of the **Agentic Factory** project — building autonomous workflow orchestration
156
+ with SLM-powered planning on Apple Silicon.