SLM Workflow Planner 7B v1 β LoRA Adapter
Model Description
LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. The model makes real-time decisions about workflow transitions by analyzing state signals, eligible nodes, and topology information.
Decision Types
| Decision | Description |
|---|---|
| NEXT | Proceed to the next sequential step |
| RETRY | Retry the current step (within budget) |
| FORK | Launch parallel execution branches |
| JOIN | Synchronize parallel branches |
| META | Escalate β anomaly detected, human intervention needed |
Training Details
Base Model
- Model: Qwen/Qwen2.5-7B-Instruct
- Parameters: 7.6B (40.4M trainable via LoRA = 0.53%)
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank | 16 |
| Alpha (scale) | 32 (2.0x) |
| Dropout | 0.02 |
| Target layers | Last 28 of 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
Training Configuration
| Parameter | Value |
|---|---|
| Framework | MLX (Apple Silicon native) |
| Hardware | Apple M4 Pro, 48GB unified memory |
| Iterations | 2,600 (converged) |
| Batch size | 4 (effective 8 with grad accumulation) |
| Learning rate | 8e-5 β 1e-6 (cosine decay) |
| Warmup | 400 steps |
| Sequence length | 512 |
| Precision | bfloat16 |
| Prompt masking | Yes (loss only on assistant tokens) |
Training Data
- Dataset: ssaraf1/slm-workflow-planner-policy-v2
- Note: This v1 model was trained on the original (pre-policy-correction) data from 89 synthetic workflow graphs. Decision labels were topology-based, not policy-conditioned. A v2 model trained on policy-corrected data is forthcoming.
Training Curve
| Iteration | Val Loss | Train Loss |
|---|---|---|
| 1 | 14.058 | β |
| 100 | 0.222 | 0.753 |
| 200 | 0.037 | 0.031 |
| 500 | 0.016 | 0.012 |
| 1000 | 0.011 | 0.009 |
| 2000 | 0.010 | 0.006 |
| 2600 | 0.009 | 0.005 |
Performance (v1 β Pre-Policy-Correction)
Evaluated on 20 representative scenarios across Workshop and Insurance Claim domains:
| Category | Accuracy | Notes |
|---|---|---|
| NEXT | 5/5 (100%) | Including policy-boundary cases |
| RETRY | 0/4 (0%) | NEXT-collapse β class imbalance |
| FORK | 0/4 (0%) | NEXT-collapse β topology-only labels |
| JOIN | 0/3 (0%) | NEXT-collapse β topology-only labels |
| META | 0/4 (0%) | NEXT-collapse β insufficient coverage |
Key Insight
This model learned protocol (single-token planner output) and NEXT progression perfectly, but suffers from NEXT-dominance due to imbalanced pre-correction training data. The v2 model addresses this with policy-corrected labels and counterfactual negatives.
What This Model Does Well
- β Produces valid planner vocabulary (NEXT/RETRY/FORK/JOIN/META)
- β Single-token structured output
- β 4Γ faster inference than base model
- β Perfect NEXT decision accuracy
- β Recognizes policy boundaries (forkable set + high resource β NEXT)
Known Limitations
- β NEXT-dominant collapse on non-NEXT decisions
- β Trained on topology-only labels (not state-conditioned)
- β Single-workflow overfitting (89 synthetic graphs)
Usage
With MLX (Apple Silicon)
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load(
"Qwen/Qwen2.5-7B-Instruct",
adapter_path="ssaraf1/slm-workflow-planner-7b-v1"
)
messages = [
{"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type."},
{"role": "user", "content": "Current node: TRIAGE_AND_ASSIGN (AGENT)\nOutcome: assigned\n\nState:\n goal_progress=0.15\n parallel_active=0\n resource_pressure=0.1\n\nEligible nodes:\n 1. VERIFY_POLICY (SYSTEM) β produces: policy_status\n 2. FRAUD_SCREENING (SYSTEM) β produces: fraud_score\n 3. DAMAGE_ASSESSMENT (AGENT) β produces: damage_report\n\nForkable sets: [{VERIFY_POLICY, FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response) # Expected: FORK
Architecture
This is a two-stage planner:
- Stage 1: Classify decision type β NEXT / RETRY / FORK / JOIN / META
- Stage 2: Select node(s) from eligible candidates based on decision type
The adapter handles both stages via the same LoRA weights.
Files
adapters.safetensorsβ LoRA adapter weights (checkpoint iter 2600)adapter_config.jsonβ LoRA configuration for MLX
Citation
Part of the Agentic Factory project β building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.
Quantized