ssaraf1 commited on
Commit
9a99bbe
Β·
verified Β·
1 Parent(s): f65bdcc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-7B-Instruct
4
+ tags:
5
+ - workflow-planning
6
+ - slm
7
+ - lora
8
+ - mlx
9
+ - apple-silicon
10
+ - policy-learning
11
+ - qwen2
12
+ - text-classification
13
+ library_name: mlx
14
+ pipeline_tag: text-generation
15
+ language:
16
+ - en
17
+ datasets:
18
+ - ssaraf1/slm-workflow-planner-policy-v2
19
+ ---
20
+
21
+ # SLM Workflow Planner 7B v1 β€” LoRA Adapter
22
+
23
+ ## Model Description
24
+
25
+ LoRA adapter for **Qwen/Qwen2.5-7B-Instruct** fine-tuned as a **workflow execution planner**.
26
+ The model makes real-time decisions about workflow transitions by analyzing state signals,
27
+ eligible nodes, and topology information.
28
+
29
+ ### Decision Types
30
+
31
+ | Decision | Description |
32
+ |----------|-------------|
33
+ | **NEXT** | Proceed to the next sequential step |
34
+ | **RETRY** | Retry the current step (within budget) |
35
+ | **FORK** | Launch parallel execution branches |
36
+ | **JOIN** | Synchronize parallel branches |
37
+ | **META** | Escalate β€” anomaly detected, human intervention needed |
38
+
39
+ ## Training Details
40
+
41
+ ### Base Model
42
+ - **Model**: Qwen/Qwen2.5-7B-Instruct
43
+ - **Parameters**: 7.6B (40.4M trainable via LoRA = 0.53%)
44
+
45
+ ### LoRA Configuration
46
+ | Parameter | Value |
47
+ |-----------|-------|
48
+ | Rank | 16 |
49
+ | Alpha (scale) | 32 (2.0x) |
50
+ | Dropout | 0.02 |
51
+ | Target layers | Last 28 of 32 |
52
+ | Target modules | q_proj, k_proj, v_proj, o_proj |
53
+
54
+ ### Training Configuration
55
+ | Parameter | Value |
56
+ |-----------|-------|
57
+ | Framework | MLX (Apple Silicon native) |
58
+ | Hardware | Apple M4 Pro, 48GB unified memory |
59
+ | Iterations | 2,600 (converged) |
60
+ | Batch size | 4 (effective 8 with grad accumulation) |
61
+ | Learning rate | 8e-5 β†’ 1e-6 (cosine decay) |
62
+ | Warmup | 400 steps |
63
+ | Sequence length | 512 |
64
+ | Precision | bfloat16 |
65
+ | Prompt masking | Yes (loss only on assistant tokens) |
66
+
67
+ ### Training Data
68
+ - **Dataset**: [ssaraf1/slm-workflow-planner-policy-v2](https://huggingface.co/datasets/ssaraf1/slm-workflow-planner-policy-v2)
69
+ - **Note**: This v1 model was trained on the original (pre-policy-correction) data
70
+ from 89 synthetic workflow graphs. Decision labels were topology-based, not
71
+ policy-conditioned. A v2 model trained on policy-corrected data is forthcoming.
72
+
73
+ ### Training Curve
74
+ | Iteration | Val Loss | Train Loss |
75
+ |-----------|----------|------------|
76
+ | 1 | 14.058 | β€” |
77
+ | 100 | 0.222 | 0.753 |
78
+ | 200 | 0.037 | 0.031 |
79
+ | 500 | 0.016 | 0.012 |
80
+ | 1000 | 0.011 | 0.009 |
81
+ | 2000 | 0.010 | 0.006 |
82
+ | 2600 | 0.009 | 0.005 |
83
+
84
+ ## Performance (v1 β€” Pre-Policy-Correction)
85
+
86
+ Evaluated on 20 representative scenarios across Workshop and Insurance Claim domains:
87
+
88
+ | Category | Accuracy | Notes |
89
+ |----------|----------|-------|
90
+ | NEXT | 5/5 (100%) | Including policy-boundary cases |
91
+ | RETRY | 0/4 (0%) | NEXT-collapse β€” class imbalance |
92
+ | FORK | 0/4 (0%) | NEXT-collapse β€” topology-only labels |
93
+ | JOIN | 0/3 (0%) | NEXT-collapse β€” topology-only labels |
94
+ | META | 0/4 (0%) | NEXT-collapse β€” insufficient coverage |
95
+
96
+ ### Key Insight
97
+ This model learned **protocol** (single-token planner output) and **NEXT progression**
98
+ perfectly, but suffers from NEXT-dominance due to imbalanced pre-correction training data.
99
+ The v2 model addresses this with policy-corrected labels and counterfactual negatives.
100
+
101
+ ### What This Model Does Well
102
+ - βœ… Produces valid planner vocabulary (NEXT/RETRY/FORK/JOIN/META)
103
+ - βœ… Single-token structured output
104
+ - βœ… 4Γ— faster inference than base model
105
+ - βœ… Perfect NEXT decision accuracy
106
+ - βœ… Recognizes policy boundaries (forkable set + high resource β†’ NEXT)
107
+
108
+ ### Known Limitations
109
+ - ❌ NEXT-dominant collapse on non-NEXT decisions
110
+ - ❌ Trained on topology-only labels (not state-conditioned)
111
+ - ❌ Single-workflow overfitting (89 synthetic graphs)
112
+
113
+ ## Usage
114
+
115
+ ### With MLX (Apple Silicon)
116
+
117
+ ```python
118
+ from mlx_lm import load, generate
119
+ from mlx_lm.sample_utils import make_sampler
120
+
121
+ model, tokenizer = load(
122
+ "Qwen/Qwen2.5-7B-Instruct",
123
+ adapter_path="ssaraf1/slm-workflow-planner-7b-v1"
124
+ )
125
+
126
+ messages = [
127
+ {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type."},
128
+ {"role": "user", "content": "Current node: TRIAGE_AND_ASSIGN (AGENT)\nOutcome: assigned\n\nState:\n goal_progress=0.15\n parallel_active=0\n resource_pressure=0.1\n\nEligible nodes:\n 1. VERIFY_POLICY (SYSTEM) β†’ produces: policy_status\n 2. FRAUD_SCREENING (SYSTEM) β†’ produces: fraud_score\n 3. DAMAGE_ASSESSMENT (AGENT) β†’ produces: damage_report\n\nForkable sets: [{VERIFY_POLICY, FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
129
+ ]
130
+
131
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
132
+ sampler = make_sampler(temp=0.0)
133
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
134
+ print(response) # Expected: FORK
135
+ ```
136
+
137
+ ## Architecture
138
+
139
+ This is a **two-stage planner**:
140
+ 1. **Stage 1**: Classify decision type β†’ NEXT / RETRY / FORK / JOIN / META
141
+ 2. **Stage 2**: Select node(s) from eligible candidates based on decision type
142
+
143
+ The adapter handles both stages via the same LoRA weights.
144
+
145
+ ## Files
146
+
147
+ - `adapters.safetensors` β€” LoRA adapter weights (checkpoint iter 2600)
148
+ - `adapter_config.json` β€” LoRA configuration for MLX
149
+
150
+ ## Citation
151
+
152
+ Part of the **Agentic Factory** project β€” building autonomous workflow orchestration
153
+ with SLM-powered planning on Apple Silicon.