SLM Workflow Planner 7B v3 β€” Fork-Suppression Alignment (Best Overall)

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v3 model β€” the best-performing checkpoint across all training phases, trained in three stages:

  1. Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
  2. Stage B: Contrastive alignment on 20K curated samples with clean decision boundaries (iter 100)
  3. Stage C: Fork-suppression alignment on 4.6K targeted samples to fix FORK over-triggering (iter 200)

The model makes real-time decisions about workflow transitions by analyzing state signals, eligible nodes, and topology information.

Decision Types

Decision Description
NEXT Proceed to the next sequential step
RETRY Retry the current step (within budget)
FORK Launch parallel execution branches
JOIN Synchronize parallel branches
META Escalate β€” anomaly detected, human intervention needed

Performance (76-scenario evaluation suite)

Category v3 SLM v2 SLM GPT-4.1 GPT-4o-mini Base SLM
NEXT 15/22 (68%) 8/22 (36%) 6/22 (27%) 2/22 (9%) 16/22 (73%)
RETRY 12/12 (100%) 7/12 (58%) 11/12 (92%) 12/12 (100%) 3/12 (25%)
FORK 12/14 (86%) 14/15 (93%) 14/14 (100%) 14/14 (100%) 1/14 (7%)
JOIN 6/15 (40%) 10/15 (67%) 10/15 (67%) 12/15 (80%) 0/15 (0%)
META 0/13 (0%) 3/12 (25%) 0/13 (0%) 0/13 (0%) 8/13 (62%)
TOTAL 45/76 (59.2%) 42/76 (55.3%) 41/76 (53.9%) 40/76 (52.6%) 28/76 (36.8%)

Key Results

  • πŸ† Best overall accuracy: 59.2% β€” outperforms all previous versions and GPT-4.1
  • πŸ”₯ RETRY: 100% β€” perfect retry handling (was 58% in v2)
  • πŸ”₯ FORK: 86% β€” strong parallel execution decisions with correct suppression
  • πŸ”₯ NEXT: 68% β€” massive improvement over v2 (36%) without collapse to NEXT
  • ⚑ Balanced policy β€” the only checkpoint that achieves strong NEXT + RETRY + FORK simultaneously
  • ⚑ 4x faster inference than base model, runs locally on Apple Silicon

Architecture Evolution

Version Strategy Total NEXT RETRY FORK JOIN META
v1 (base) 800-iter policy training 36.8% 73% 25% 7% 0% 62%
v2 + contrastive alignment 55.3% 36% 58% 93% 67% 25%
v3 + fork suppression 59.2% 68% 100% 86% 40% 0%

v3 fixes v2's FORK over-triggering problem. v2 had learned "forkable β†’ FORK" blindly. v3 correctly learns "forkable AND conditions favorable β†’ FORK, otherwise NEXT".

Training Details

Three-Stage Training

Stage A: Base Policy (iter 800)

  • Dataset: 554K instruction pairs from 89 workflow graphs
  • 8 structural families (linear, retry, fork-join, escalation, etc.)
  • Balanced decision distribution: NEXT 36%, JOIN 27%, META 13%, FORK 12%, RETRY 12%

Stage B: Contrastive Alignment (iter 100)

  • Dataset: 20K curated samples with clean decision boundaries
  • Contrastive pairs: FORK positives + hard negatives, JOIN positives + hard negatives
  • Proportional representation across all decision types

Stage C: Fork Suppression (iter 200)

  • Dataset: 4,600 targeted samples
  • Focus: "forkable but blocked β†’ NEXT" hard negatives
  • Teaches: resource pressure, parallel depth, uncertainty block FORK
  • Stabilizers: RETRY and NEXT anchors to prevent forgetting

LoRA Configuration

Parameter Value
Rank 16
Alpha (scale) 32 (2.0x)
Dropout 0.02
Target layers Last 28 of 32
Target modules q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter Value
Framework MLX (Apple Silicon native)
Hardware Apple M4 Pro, 48GB unified memory
Stage A iters 800
Stage B iters 100
Stage C iters 200
Batch size 4
Learning rate 2e-5 (fork-suppression stage)
Sequence length 512
Prompt masking Yes (loss only on assistant tokens)

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v3"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
    {"role": "user", "content": "Current node: VERIFY_POLICY (SYSTEM)\nOutcome: success\n\nState:\n  goal_progress=0.35\n  parallel_active=0\n  resource_pressure=0.1\n  retry_count=0\n\nEligible nodes:\n  1. FRAUD_SCREENING (SYSTEM) β†’ produces: fraud_score\n  2. DAMAGE_ASSESSMENT (AGENT) β†’ produces: damage_report\n\nForkable sets: [{FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: FORK (low pressure, independent actors)

What Makes v3 Special

Fork Suppression β€” Correct Policy Boundaries

v2 over-triggered FORK whenever forkable_sets was present. v3 learned the correct policy:

Scenario Topology State v2 Decision v3 Decision
Low pressure + independent Forkable Go parallel FORK βœ… FORK βœ…
High resource pressure Forkable Don't parallelize FORK ❌ NEXT βœ…
Already in parallel Forkable Too deep FORK ❌ NEXT βœ…
High uncertainty Forkable Risky FORK ❌ NEXT βœ…
First retry failure Not forkable Retry available NEXT ❌ RETRY βœ…

Remaining Challenges (v4 targets)

  • JOIN: 40% β€” model struggles with join synchronization
  • META: 0% β€” anomaly detection not yet learned
  • These require a unified alignment approach (not sequential patching)

Files

  • adapters.safetensors β€” LoRA adapter weights (Stage A + B + C)
  • adapter_config.json β€” LoRA configuration for MLX

Citation

Part of the Agentic Factory project β€” building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sameer-saraf-quant-ai/slm-workflow-planner-7b-v3

Base model

Qwen/Qwen2.5-7B
Adapter
(1160)
this model

Datasets used to train sameer-saraf-quant-ai/slm-workflow-planner-7b-v3