SLM Workflow Planner 7B v1 β€” LoRA Adapter

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. The model makes real-time decisions about workflow transitions by analyzing state signals, eligible nodes, and topology information.

Decision Types

Decision Description
NEXT Proceed to the next sequential step
RETRY Retry the current step (within budget)
FORK Launch parallel execution branches
JOIN Synchronize parallel branches
META Escalate β€” anomaly detected, human intervention needed

Training Details

Base Model

  • Model: Qwen/Qwen2.5-7B-Instruct
  • Parameters: 7.6B (40.4M trainable via LoRA = 0.53%)

LoRA Configuration

Parameter Value
Rank 16
Alpha (scale) 32 (2.0x)
Dropout 0.02
Target layers Last 28 of 32
Target modules q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter Value
Framework MLX (Apple Silicon native)
Hardware Apple M4 Pro, 48GB unified memory
Iterations 2,600 (converged)
Batch size 4 (effective 8 with grad accumulation)
Learning rate 8e-5 β†’ 1e-6 (cosine decay)
Warmup 400 steps
Sequence length 512
Precision bfloat16
Prompt masking Yes (loss only on assistant tokens)

Training Data

  • Dataset: ssaraf1/slm-workflow-planner-policy-v2
  • Note: This v1 model was trained on the original (pre-policy-correction) data from 89 synthetic workflow graphs. Decision labels were topology-based, not policy-conditioned. A v2 model trained on policy-corrected data is forthcoming.

Training Curve

Iteration Val Loss Train Loss
1 14.058 β€”
100 0.222 0.753
200 0.037 0.031
500 0.016 0.012
1000 0.011 0.009
2000 0.010 0.006
2600 0.009 0.005

Performance (v1 β€” Pre-Policy-Correction)

Evaluated on 20 representative scenarios across Workshop and Insurance Claim domains:

Category Accuracy Notes
NEXT 5/5 (100%) Including policy-boundary cases
RETRY 0/4 (0%) NEXT-collapse β€” class imbalance
FORK 0/4 (0%) NEXT-collapse β€” topology-only labels
JOIN 0/3 (0%) NEXT-collapse β€” topology-only labels
META 0/4 (0%) NEXT-collapse β€” insufficient coverage

Key Insight

This model learned protocol (single-token planner output) and NEXT progression perfectly, but suffers from NEXT-dominance due to imbalanced pre-correction training data. The v2 model addresses this with policy-corrected labels and counterfactual negatives.

What This Model Does Well

  • βœ… Produces valid planner vocabulary (NEXT/RETRY/FORK/JOIN/META)
  • βœ… Single-token structured output
  • βœ… 4Γ— faster inference than base model
  • βœ… Perfect NEXT decision accuracy
  • βœ… Recognizes policy boundaries (forkable set + high resource β†’ NEXT)

Known Limitations

  • ❌ NEXT-dominant collapse on non-NEXT decisions
  • ❌ Trained on topology-only labels (not state-conditioned)
  • ❌ Single-workflow overfitting (89 synthetic graphs)

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v1"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type."},
    {"role": "user", "content": "Current node: TRIAGE_AND_ASSIGN (AGENT)\nOutcome: assigned\n\nState:\n  goal_progress=0.15\n  parallel_active=0\n  resource_pressure=0.1\n\nEligible nodes:\n  1. VERIFY_POLICY (SYSTEM) β†’ produces: policy_status\n  2. FRAUD_SCREENING (SYSTEM) β†’ produces: fraud_score\n  3. DAMAGE_ASSESSMENT (AGENT) β†’ produces: damage_report\n\nForkable sets: [{VERIFY_POLICY, FRAUD_SCREENING, DAMAGE_ASSESSMENT}]\nJoin-ready: []\n\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: FORK

Architecture

This is a two-stage planner:

  1. Stage 1: Classify decision type β†’ NEXT / RETRY / FORK / JOIN / META
  2. Stage 2: Select node(s) from eligible candidates based on decision type

The adapter handles both stages via the same LoRA weights.

Files

  • adapters.safetensors β€” LoRA adapter weights (checkpoint iter 2600)
  • adapter_config.json β€” LoRA configuration for MLX

Citation

Part of the Agentic Factory project β€” building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ssaraf1/slm-workflow-planner-7b-v1

Base model

Qwen/Qwen2.5-7B
Adapter
(1166)
this model

Dataset used to train ssaraf1/slm-workflow-planner-7b-v1