SLM Workflow Planner 7B v7 β€” META-Strengthened Alignment (Iter 110)

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v7 model β€” specialized for JOIN and META detection, trained through 7 stages of progressive alignment from the base policy checkpoint.

Training Lineage

  1. Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
  2. Stage B: Contrastive alignment on 20K samples (iter 100) β†’ v2
  3. Stage C: Fork-suppression alignment on 4.6K samples (iter 200) β†’ v3-best
  4. Stage D: Signal-overlap restoration on 4K samples (80 iters) β†’ v6-100
  5. Stage E: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) β†’ v7-110

Decision Types

Decision Description
NEXT Proceed to the next sequential step
RETRY Retry the current step (within budget)
FORK Launch parallel execution branches
JOIN Synchronize parallel branches
META Escalate β€” anomaly detected, human intervention needed

Performance (76-scenario evaluation suite)

v7-110 Standalone

Category v7-110 v3-best GPT-4.1
NEXT 17/22 (77%) 12/22 (55%) 6/22 (27%)
RETRY 0/12 (0%) 12/12 (100%) 11/12 (92%)
FORK 1/14 (7%) 14/14 (100%) 14/14 (100%)
JOIN 14/15 (93%) 15/15 (100%) 10/15 (67%)
META 10/13 (77%) 0/13 (0%) 0/13 (0%)
TOTAL 42/76 (55.3%) 53/76 (69.7%) 41/76 (53.9%)

Key Strengths

  • πŸ”₯ META: 77% β€” only model that detects anomalies (all others at 0%)
  • πŸ”₯ JOIN: 93% β€” near-perfect synchronization detection
  • πŸ”₯ NEXT: 77% β€” strong sequential progression

Ensemble with v3-best (Vote)

When combined with v3-best in a vote ensemble:

  • v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
  • v7-110 covers META (77%), NEXT (77%)
  • Combined: significantly better than any individual model

v7 Training Dataset Design

The v7 dataset was specifically designed to address:

  1. META manifold strengthening β€” quality-filtered META samples with anomaly outcomes
  2. Synthetic anomaly patterns β€” matching evaluation suite scenarios
  3. Risk-weighted allocation — more samples for high-risk misclassifications (RETRY→NEXT, JOIN→NEXT)
  4. Rehearsal for all classes β€” broad sampling to prevent catastrophic forgetting

Dataset Distribution

Category Samples %
META 904 28.2%
NEXT 800 25.0%
RETRY 500 15.6%
FORK 500 15.6%
JOIN 500 15.6%
Total 3,204 100%

LoRA Configuration

Parameter Value
Rank 16
Alpha (scale) 32 (2.0x)
Dropout 0.02
Target layers Last 28 of 32
Target modules q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter Value
Framework MLX (Apple Silicon native)
Hardware Apple M4 Pro, 48GB unified memory
Iterations 110 (best val loss)
Batch size 4
Learning rate 1e-5
Sequence length 512
Prompt masking Yes
Resume from v6-100 checkpoint

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
    {"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: META (high uncertainty + anomaly)

Recommended: Ensemble with v3-best

For production use, combine v7-110 with v3-best in a vote ensemble:

  • Use v3-best (ssaraf1/slm-workflow-planner-7b-v3) for RETRY/FORK/JOIN decisions
  • Use v7-110 for META/NEXT decisions
  • Confidence-weighted voting resolves disagreements

Files

  • adapters.safetensors β€” LoRA adapter weights (v7-110 checkpoint)
  • adapter_config.json β€” LoRA configuration for MLX

Citation

Part of the Agentic Factory project β€” building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sameer-saraf-quant-ai/slm-workflow-planner-7b-v7

Base model

Qwen/Qwen2.5-7B
Adapter
(1160)
this model

Datasets used to train sameer-saraf-quant-ai/slm-workflow-planner-7b-v7