SLM Workflow Planner 7B v3 — Fork-Suppression Alignment (Best Overall)

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v3 model — the best-performing checkpoint across all training phases, trained in three stages:

Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
Stage B: Contrastive alignment on 20K curated samples with clean decision boundaries (iter 100)
Stage C: Fork-suppression alignment on 4.6K targeted samples to fix FORK over-triggering (iter 200)

The model makes real-time decisions about workflow transitions by analyzing state signals, eligible nodes, and topology information.

Decision Types

Decision	Description
NEXT	Proceed to the next sequential step
RETRY	Retry the current step (within budget)
FORK	Launch parallel execution branches
JOIN	Synchronize parallel branches
META	Escalate — anomaly detected, human intervention needed

Performance (76-scenario evaluation suite)

Category	v3 SLM	v2 SLM	GPT-4.1	GPT-4o-mini	Base SLM
NEXT	15/22 (68%)	8/22 (36%)	6/22 (27%)	2/22 (9%)	16/22 (73%)
RETRY	12/12 (100%)	7/12 (58%)	11/12 (92%)	12/12 (100%)	3/12 (25%)
FORK	12/14 (86%)	14/15 (93%)	14/14 (100%)	14/14 (100%)	1/14 (7%)
JOIN	6/15 (40%)	10/15 (67%)	10/15 (67%)	12/15 (80%)	0/15 (0%)
META	0/13 (0%)	3/12 (25%)	0/13 (0%)	0/13 (0%)	8/13 (62%)
TOTAL	45/76 (59.2%)	42/76 (55.3%)	41/76 (53.9%)	40/76 (52.6%)	28/76 (36.8%)

Key Results

🏆 Best overall accuracy: 59.2% — outperforms all previous versions and GPT-4.1
🔥 RETRY: 100% — perfect retry handling (was 58% in v2)
🔥 FORK: 86% — strong parallel execution decisions with correct suppression
🔥 NEXT: 68% — massive improvement over v2 (36%) without collapse to NEXT
⚡ Balanced policy — the only checkpoint that achieves strong NEXT + RETRY + FORK simultaneously
⚡ 4x faster inference than base model, runs locally on Apple Silicon

Architecture Evolution

Version	Strategy	Total	NEXT	RETRY	FORK	JOIN	META
v1 (base)	800-iter policy training	36.8%	73%	25%	7%	0%	62%
v2	+ contrastive alignment	55.3%	36%	58%	93%	67%	25%
v3	+ fork suppression	59.2%	68%	100%	86%	40%	0%

v3 fixes v2's FORK over-triggering problem. v2 had learned "forkable → FORK" blindly. v3 correctly learns "forkable AND conditions favorable → FORK, otherwise NEXT".

Training Details

Three-Stage Training

Stage A: Base Policy (iter 800)

Dataset: 554K instruction pairs from 89 workflow graphs
8 structural families (linear, retry, fork-join, escalation, etc.)
Balanced decision distribution: NEXT 36%, JOIN 27%, META 13%, FORK 12%, RETRY 12%

Stage B: Contrastive Alignment (iter 100)

Dataset: 20K curated samples with clean decision boundaries
Contrastive pairs: FORK positives + hard negatives, JOIN positives + hard negatives
Proportional representation across all decision types

Stage C: Fork Suppression (iter 200)

Dataset: 4,600 targeted samples
Focus: "forkable but blocked → NEXT" hard negatives
Teaches: resource pressure, parallel depth, uncertainty block FORK
Stabilizers: RETRY and NEXT anchors to prevent forgetting

LoRA Configuration

Parameter	Value
Rank	16
Alpha (scale)	32 (2.0x)
Dropout	0.02
Target layers	Last 28 of 32
Target modules	q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter	Value
Framework	MLX (Apple Silicon native)
Hardware	Apple M4 Pro, 48GB unified memory
Stage A iters	800
Stage B iters	100
Stage C iters	200
Batch size	4
Learning rate	2e-5 (fork-suppression stage)
Sequence length	512
Prompt masking	Yes (loss only on assistant tokens)

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v3"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
    {"role": "user", "content": "Current node: VERIFY_POLICY (SYSTEM)\nOutcome: success\n\nState:\n  goal_progress=0.35\n  parallel_active=0\n  resource_pressure=0.1\n  retry_count=0\n\nEligible nodes:\n  1. FRAUD_SCREENING (SYSTEM) → produces: fraud_score\n  2. DAMAGE_ASSESSMENT (AGENT) → produces: damage_report\n\nForkable sets: [{FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: FORK (low pressure, independent actors)

What Makes v3 Special

Fork Suppression — Correct Policy Boundaries

v2 over-triggered FORK whenever forkable_sets was present. v3 learned the correct policy:

Scenario	Topology	State	v2 Decision	v3 Decision
Low pressure + independent	Forkable	Go parallel	FORK ✅	FORK ✅
High resource pressure	Forkable	Don't parallelize	FORK ❌	NEXT ✅
Already in parallel	Forkable	Too deep	FORK ❌	NEXT ✅
High uncertainty	Forkable	Risky	FORK ❌	NEXT ✅
First retry failure	Not forkable	Retry available	NEXT ❌	RETRY ✅

Remaining Challenges (v4 targets)

JOIN: 40% — model struggles with join synchronization
META: 0% — anomaly detection not yet learned
These require a unified alignment approach (not sequential patching)

Files

adapters.safetensors — LoRA adapter weights (Stage A + B + C)
adapter_config.json — LoRA configuration for MLX

Citation

Part of the Agentic Factory project — building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for sameer-saraf-quant-ai/slm-workflow-planner-7b-v3

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2127)

this model

sameer-saraf-quant-ai
/

slm-workflow-planner-7b-v3