SLM Workflow Planner v8 β€” Context-Contract Planning (MLX LoRA)

Overview

v8 is a Stage 2 enhancement of the SLM Workflow Planner. It extends the v3-best checkpoint with context-contract planning β€” the ability to make routing decisions based on the required_context and produces_context of ALL nodes in a workflow graph, not just directly connected edges.

This enables three new capabilities:

  • Recovery Routing (Backjump): On failure, jump backward to an earlier context-satisfiable node
  • Stage Skipping: Skip unnecessary stages when required context is already available (e.g., walk-in customers)
  • Non-Adjacent Parallelism: Fork two independent context-satisfiable nodes that aren't connected by fork-edges

Model Details

Property Value
Base Model Qwen/Qwen2.5-7B-Instruct
Fine-tune Type LoRA (MLX format)
LoRA Rank 16
LoRA Scale 2.0
LoRA Dropout 0.02
Tuned Layers 28/32
Trainable Parameters 40.37M (0.53%)
Framework MLX (Apple Silicon)

Training

Property Value
Lineage base(8000) β†’ v2(100) β†’ v3(200) β†’ v3-cont β†’ v3-best β†’ v8(1000)
Resume Checkpoint v3-best (59.2% on 76-scenario suite)
Training Iterations 1000 (stopped early β€” val loss converged)
Learning Rate 2e-5 (cosine decay to 1e-6, 100-step warmup)
Batch Size 4 (effective 8 with grad accumulation)
Max Sequence Length 768 tokens
Dataset 696K samples from 150 workflows
Val Loss 0.032 (from 0.272 starting)

Training Data Distribution

Category Count % Description
META 187K 26.9% Dead-end escalation
NEGATIVE 187K 26.9% Tier-2 visible but edge chosen ("satisfiable β‰  sensible")
NEXT_EDGE 116K 16.7% Normal edge progression
NEXT_SKIP πŸ›‘ 55K 8.0% Forward dead-end recovery (Tier-2)
RETRY 36K 5.2% Edge retry on failure
JOIN 30K 4.3% Parallel branch merge
NEXT_BACKJUMP πŸ›‘ 28K 4.0% Failure recovery to earlier node (Tier-2)
FORK_EDGE 28K 4.0% Edge-adjacent fork
FORK_NONADJ πŸ›‘ 28K 4.0% Non-adjacent parallel fork (Tier-2)

πŸ›‘ = Protected from downsampling during balancing

Prompt Format

The model uses a tiered prompt with two candidate sections:

Current node: NODE_A (SYSTEM, stage 3)
Outcome: success
Failure type: none

State:
  goal_progress=0.40
  retry_count=0
  ...

Produced context: {ctx_start, intake_data, assessment_score}

Edge candidates (normal path):
  1. NODE_B (AGENT) [processor] β†’ requires: {assessment_score} β†’ produces: {approval}

Context-eligible (off-path, invocable now):
  1. NODE_X (SYSTEM, stage 5, gap=+2) [validator] β†’ requires: {intake_data} βœ“ β†’ produces: {validation}

Forkable sets: []
Join-ready: []

What is the best action?

Output format: DECISION_TYPE NODE_ID

  • NEXT NODE_B β€” advance to NODE_B
  • FORK NODE_A, NODE_B β€” parallel fork
  • RETRY NODE_A β€” retry current
  • JOIN NODE_A β€” merge parallel branches
  • META β€” escalate to human

Evaluation Results

Section A: Stratified Test (100 held-out samples)

Category Exact Accuracy Type Accuracy
META 20/20 (100%) 20/20 (100%)
NEGATIVE (Tier-2 visible, edge chosen) 5/5 (100%) 5/5 (100%)
SKIP_FORWARD 7/7 (100%) 7/7 (100%)
RETRY 18/20 (90%) 18/20 (90%)
JOIN 16/20 (80%) 16/20 (80%)
FORK (non-adjacent) 12/18 (67%) 14/18 (78%)
NEXT (edge) 5/8 (63%) 8/8 (100%)
TOTAL 83/100 (83%) 88/100 (88%)

Section B: Tier-2 Specific (90 held-out samples)

Category Exact Accuracy Type Accuracy
Non-Adjacent Fork 15/15 (100%) 15/15 (100%)
META with Context 15/15 (100%) 15/15 (100%)
Negative Contrast 14/15 (93%) 14/15 (93%)
RETRY with Context 14/15 (93%) 14/15 (93%)
Skip Forward 13/15 (87%) 14/15 (93%)
JOIN with Context 10/15 (67%) 10/15 (67%)
TOTAL 81/90 (90%) 82/90 (91%)

Key Capabilities

  1. Context-Contract Reasoning: Evaluates required_context βŠ† produced_keys to identify all invocable nodes
  2. Recovery Routing: Backjumps on process/resource failure when no edge retry exists
  3. Stage Skipping: Advances to forward context-eligible nodes at dead-ends
  4. Non-Adjacent Parallelism: Forks independent context-eligible nodes with different actors
  5. Negative Contrast: Learned "satisfiable β‰  sensible" β€” doesn't take Tier-2 when edge path is correct

Usage (MLX)

from mlx_lm import load, generate

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="sameer-saraf-quant-ai/slm-workflow-planner-v8-mlx"
)

messages = [
    {"role": "system", "content": "You are a workflow planner..."},
    {"role": "user", "content": "<tiered prompt>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=30)
print(response)  # "NEXT ESTIMATION_AND_APPROVAL"

Ensemble Recommendation

For production use, combine with GPT-4.1 arbiter for the ~10% edge cases (mainly JOIN confusion):

  • v8 handles 90%+ of decisions autonomously
  • GPT validates uncertain decisions (estimated 5-10% of traffic)

Architecture Context

This adapter is part of the Agentic OS system:

  • Temporal handles durable execution and state management
  • Neo4j stores workflow graph definitions
  • SLM (this model) makes real-time routing decisions
  • Guardrails validate SLM output before execution
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sameer-saraf-quant-ai/slm-workflow-planner-v8-mlx

Base model

Qwen/Qwen2.5-7B
Adapter
(1160)
this model