SLM Workflow Planner 7B v1 — LoRA Adapter

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. The model makes real-time decisions about workflow transitions by analyzing state signals, eligible nodes, and topology information.

Decision Types

Decision	Description
NEXT	Proceed to the next sequential step
RETRY	Retry the current step (within budget)
FORK	Launch parallel execution branches
JOIN	Synchronize parallel branches
META	Escalate — anomaly detected, human intervention needed

Training Details

Base Model

Model: Qwen/Qwen2.5-7B-Instruct
Parameters: 7.6B (40.4M trainable via LoRA = 0.53%)

LoRA Configuration

Parameter	Value
Rank	16
Alpha (scale)	32 (2.0x)
Dropout	0.02
Target layers	Last 28 of 32
Target modules	q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter	Value
Framework	MLX (Apple Silicon native)
Hardware	Apple M4 Pro, 48GB unified memory
Iterations	2,600 (converged)
Batch size	4 (effective 8 with grad accumulation)
Learning rate	8e-5 → 1e-6 (cosine decay)
Warmup	400 steps
Sequence length	512
Precision	bfloat16
Prompt masking	Yes (loss only on assistant tokens)

Training Data

Dataset: ssaraf1/slm-workflow-planner-policy-v2
Note: This v1 model was trained on the original (pre-policy-correction) data from 89 synthetic workflow graphs. Decision labels were topology-based, not policy-conditioned. A v2 model trained on policy-corrected data is forthcoming.

Training Curve

Iteration	Val Loss	Train Loss
1	14.058	—
100	0.222	0.753
200	0.037	0.031
500	0.016	0.012
1000	0.011	0.009
2000	0.010	0.006
2600	0.009	0.005

Performance (v1 — Pre-Policy-Correction)

Evaluated on 20 representative scenarios across Workshop and Insurance Claim domains:

Category	Accuracy	Notes
NEXT	5/5 (100%)	Including policy-boundary cases
RETRY	0/4 (0%)	NEXT-collapse — class imbalance
FORK	0/4 (0%)	NEXT-collapse — topology-only labels
JOIN	0/3 (0%)	NEXT-collapse — topology-only labels
META	0/4 (0%)	NEXT-collapse — insufficient coverage

Key Insight

This model learned protocol (single-token planner output) and NEXT progression perfectly, but suffers from NEXT-dominance due to imbalanced pre-correction training data. The v2 model addresses this with policy-corrected labels and counterfactual negatives.

What This Model Does Well

✅ Produces valid planner vocabulary (NEXT/RETRY/FORK/JOIN/META)
✅ Single-token structured output
✅ 4× faster inference than base model
✅ Perfect NEXT decision accuracy
✅ Recognizes policy boundaries (forkable set + high resource → NEXT)

Known Limitations

❌ NEXT-dominant collapse on non-NEXT decisions
❌ Trained on topology-only labels (not state-conditioned)
❌ Single-workflow overfitting (89 synthetic graphs)

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v1"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type."},
    {"role": "user", "content": "Current node: TRIAGE_AND_ASSIGN (AGENT)\nOutcome: assigned\n\nState:\n  goal_progress=0.15\n  parallel_active=0\n  resource_pressure=0.1\n\nEligible nodes:\n  1. VERIFY_POLICY (SYSTEM) → produces: policy_status\n  2. FRAUD_SCREENING (SYSTEM) → produces: fraud_score\n  3. DAMAGE_ASSESSMENT (AGENT) → produces: damage_report\n\nForkable sets: [{VERIFY_POLICY, FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: FORK

Architecture

This is a two-stage planner:

Stage 1: Classify decision type → NEXT / RETRY / FORK / JOIN / META
Stage 2: Select node(s) from eligible candidates based on decision type

The adapter handles both stages via the same LoRA weights.

Files

adapters.safetensors — LoRA adapter weights (checkpoint iter 2600)
adapter_config.json — LoRA configuration for MLX

Citation

Part of the Agentic Factory project — building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for ssaraf1/slm-workflow-planner-7b-v1

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2128)

this model

ssaraf1
/

slm-workflow-planner-7b-v1