ssaraf1
/

slm-workflow-planner-7b-v7

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+  - workflow-planning
+  - slm
+  - lora
+  - mlx
+  - apple-silicon
+  - policy-learning
+  - qwen2
+  - text-classification
+  - contrastive-alignment
+  - meta-strengthening
+  - ensemble
+library_name: mlx
+pipeline_tag: text-generation
+language:
+  - en
+datasets:
+  - ssaraf1/slm-workflow-planner-policy-v2
+  - ssaraf1/slm-workflow-planner-alignment-v2
+---
+# SLM Workflow Planner 7B v7 — META-Strengthened Alignment (Iter 110)
+## Model Description
+LoRA adapter for **Qwen/Qwen2.5-7B-Instruct** fine-tuned as a **workflow execution planner**.
+This is the **v7 model** — specialized for JOIN and META detection, trained through 7 stages
+of progressive alignment from the base policy checkpoint.
+### Training Lineage
+1. **Stage A**: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
+2. **Stage B**: Contrastive alignment on 20K samples (iter 100) → v2
+3. **Stage C**: Fork-suppression alignment on 4.6K samples (iter 200) → v3-best
+4. **Stage D**: Signal-overlap restoration on 4K samples (80 iters) → v6-100
+5. **Stage E**: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) → **v7-110**
+### Decision Types
+| Decision | Description |
+|----------|-------------|
+| **NEXT** | Proceed to the next sequential step |
+| **RETRY** | Retry the current step (within budget) |
+| **FORK** | Launch parallel execution branches |
+| **JOIN** | Synchronize parallel branches |
+| **META** | Escalate — anomaly detected, human intervention needed |
+## Performance (76-scenario evaluation suite)
+### v7-110 Standalone
+| Category | **v7-110** | v3-best | GPT-4.1 |
+|----------|-----------|---------|---------|
+| **NEXT** | 17/22 (77%) | 12/22 (55%) | 6/22 (27%) |
+| **RETRY** | 0/12 (0%) | 12/12 (100%) | 11/12 (92%) |
+| **FORK** | 1/14 (7%) | 14/14 (100%) | 14/14 (100%) |
+| **JOIN** | 14/15 (93%) | 15/15 (100%) | 10/15 (67%) |
+| **META** | 10/13 (77%) | 0/13 (0%) | 0/13 (0%) |
+| **TOTAL** | 42/76 (55.3%) | 53/76 (69.7%) | 41/76 (53.9%) |
+### Key Strengths
+- 🔥 **META: 77%** — only model that detects anomalies (all others at 0%)
+- 🔥 **JOIN: 93%** — near-perfect synchronization detection
+- 🔥 **NEXT: 77%** — strong sequential progression
+### Ensemble with v3-best (Vote)
+When combined with v3-best in a vote ensemble:
+- v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
+- v7-110 covers META (77%), NEXT (77%)
+- **Combined: significantly better than any individual model**
+## v7 Training Dataset Design
+The v7 dataset was specifically designed to address:
+1. **META manifold strengthening** — quality-filtered META samples with anomaly outcomes
+2. **Synthetic anomaly patterns** — matching evaluation suite scenarios
+3. **Risk-weighted allocation** — more samples for high-risk misclassifications (RETRY→NEXT, JOIN→NEXT)
+4. **Rehearsal for all classes** — broad sampling to prevent catastrophic forgetting
+### Dataset Distribution
+| Category | Samples | % |
+|----------|---------|---|
+| META | 904 | 28.2% |
+| NEXT | 800 | 25.0% |
+| RETRY | 500 | 15.6% |
+| FORK | 500 | 15.6% |
+| JOIN | 500 | 15.6% |
+| **Total** | **3,204** | 100% |
+## LoRA Configuration
+| Parameter | Value |
+|-----------|-------|
+| Rank | 16 |
+| Alpha (scale) | 32 (2.0x) |
+| Dropout | 0.02 |
+| Target layers | Last 28 of 32 |
+| Target modules | q_proj, k_proj, v_proj, o_proj |
+## Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Framework | MLX (Apple Silicon native) |
+| Hardware | Apple M4 Pro, 48GB unified memory |
+| Iterations | 110 (best val loss) |
+| Batch size | 4 |
+| Learning rate | 1e-5 |
+| Sequence length | 512 |
+| Prompt masking | Yes |
+| Resume from | v6-100 checkpoint |
+## Usage
+### With MLX (Apple Silicon)
+```python
+from mlx_lm import load, generate
+from mlx_lm.sample_utils import make_sampler
+model, tokenizer = load(
+    "Qwen/Qwen2.5-7B-Instruct",
+    adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
+)
+messages = [
+    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
+    {"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+sampler = make_sampler(temp=0.0)
+response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
+print(response)  # Expected: META (high uncertainty + anomaly)
+```
+## Recommended: Ensemble with v3-best
+For production use, combine v7-110 with v3-best in a vote ensemble:
+- Use v3-best (`ssaraf1/slm-workflow-planner-7b-v3`) for RETRY/FORK/JOIN decisions
+- Use v7-110 for META/NEXT decisions
+- Confidence-weighted voting resolves disagreements
+## Files
+- `adapters.safetensors` — LoRA adapter weights (v7-110 checkpoint)
+- `adapter_config.json` — LoRA configuration for MLX
+## Citation
+Part of the **Agentic Factory** project — building autonomous workflow orchestration
+with SLM-powered planning on Apple Silicon.