ssaraf1
/

slm-workflow-planner-7b-v1

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+  - workflow-planning
+  - slm
+  - lora
+  - mlx
+  - apple-silicon
+  - policy-learning
+  - qwen2
+  - text-classification
+library_name: mlx
+pipeline_tag: text-generation
+language:
+  - en
+datasets:
+  - ssaraf1/slm-workflow-planner-policy-v2
+---
+# SLM Workflow Planner 7B v1 — LoRA Adapter
+## Model Description
+LoRA adapter for **Qwen/Qwen2.5-7B-Instruct** fine-tuned as a **workflow execution planner**.
+The model makes real-time decisions about workflow transitions by analyzing state signals,
+eligible nodes, and topology information.
+### Decision Types
+| Decision | Description |
+|----------|-------------|
+| **NEXT** | Proceed to the next sequential step |
+| **RETRY** | Retry the current step (within budget) |
+| **FORK** | Launch parallel execution branches |
+| **JOIN** | Synchronize parallel branches |
+| **META** | Escalate — anomaly detected, human intervention needed |
+## Training Details
+### Base Model
+- **Model**: Qwen/Qwen2.5-7B-Instruct
+- **Parameters**: 7.6B (40.4M trainable via LoRA = 0.53%)
+### LoRA Configuration
+| Parameter | Value |
+|-----------|-------|
+| Rank | 16 |
+| Alpha (scale) | 32 (2.0x) |
+| Dropout | 0.02 |
+| Target layers | Last 28 of 32 |
+| Target modules | q_proj, k_proj, v_proj, o_proj |
+### Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Framework | MLX (Apple Silicon native) |
+| Hardware | Apple M4 Pro, 48GB unified memory |
+| Iterations | 2,600 (converged) |
+| Batch size | 4 (effective 8 with grad accumulation) |
+| Learning rate | 8e-5 → 1e-6 (cosine decay) |
+| Warmup | 400 steps |
+| Sequence length | 512 |
+| Precision | bfloat16 |
+| Prompt masking | Yes (loss only on assistant tokens) |
+### Training Data
+- **Dataset**: [ssaraf1/slm-workflow-planner-policy-v2](https://huggingface.co/datasets/ssaraf1/slm-workflow-planner-policy-v2)
+- **Note**: This v1 model was trained on the original (pre-policy-correction) data
+  from 89 synthetic workflow graphs. Decision labels were topology-based, not
+  policy-conditioned. A v2 model trained on policy-corrected data is forthcoming.
+### Training Curve
+| Iteration | Val Loss | Train Loss |
+|-----------|----------|------------|
+| 1 | 14.058 | — |
+| 100 | 0.222 | 0.753 |
+| 200 | 0.037 | 0.031 |
+| 500 | 0.016 | 0.012 |
+| 1000 | 0.011 | 0.009 |
+| 2000 | 0.010 | 0.006 |
+| 2600 | 0.009 | 0.005 |
+## Performance (v1 — Pre-Policy-Correction)
+Evaluated on 20 representative scenarios across Workshop and Insurance Claim domains:
+| Category | Accuracy | Notes |
+|----------|----------|-------|
+| NEXT | 5/5 (100%) | Including policy-boundary cases |
+| RETRY | 0/4 (0%) | NEXT-collapse — class imbalance |
+| FORK | 0/4 (0%) | NEXT-collapse — topology-only labels |
+| JOIN | 0/3 (0%) | NEXT-collapse — topology-only labels |
+| META | 0/4 (0%) | NEXT-collapse — insufficient coverage |
+### Key Insight
+This model learned **protocol** (single-token planner output) and **NEXT progression**
+perfectly, but suffers from NEXT-dominance due to imbalanced pre-correction training data.
+The v2 model addresses this with policy-corrected labels and counterfactual negatives.
+### What This Model Does Well
+- ✅ Produces valid planner vocabulary (NEXT/RETRY/FORK/JOIN/META)
+- ✅ Single-token structured output
+- ✅ 4× faster inference than base model
+- ✅ Perfect NEXT decision accuracy
+- ✅ Recognizes policy boundaries (forkable set + high resource → NEXT)
+### Known Limitations
+- ❌ NEXT-dominant collapse on non-NEXT decisions
+- ❌ Trained on topology-only labels (not state-conditioned)
+- ❌ Single-workflow overfitting (89 synthetic graphs)
+## Usage
+### With MLX (Apple Silicon)
+```python
+from mlx_lm import load, generate
+from mlx_lm.sample_utils import make_sampler
+model, tokenizer = load(
+    "Qwen/Qwen2.5-7B-Instruct",
+    adapter_path="ssaraf1/slm-workflow-planner-7b-v1"
+)
+messages = [
+    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type."},
+    {"role": "user", "content": "Current node: TRIAGE_AND_ASSIGN (AGENT)\nOutcome: assigned\n\nState:\n  goal_progress=0.15\n  parallel_active=0\n  resource_pressure=0.1\n\nEligible nodes:\n  1. VERIFY_POLICY (SYSTEM) → produces: policy_status\n  2. FRAUD_SCREENING (SYSTEM) → produces: fraud_score\n  3. DAMAGE_ASSESSMENT (AGENT) → produces: damage_report\n\nForkable sets: [{VERIFY_POLICY, FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+sampler = make_sampler(temp=0.0)
+response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
+print(response)  # Expected: FORK
+```
+## Architecture
+This is a **two-stage planner**:
+1. **Stage 1**: Classify decision type → NEXT / RETRY / FORK / JOIN / META
+2. **Stage 2**: Select node(s) from eligible candidates based on decision type
+The adapter handles both stages via the same LoRA weights.
+## Files
+- `adapters.safetensors` — LoRA adapter weights (checkpoint iter 2600)
+- `adapter_config.json` — LoRA configuration for MLX
+## Citation
+Part of the **Agentic Factory** project — building autonomous workflow orchestration
+with SLM-powered planning on Apple Silicon.