ssaraf1's picture
Upload README.md with huggingface_hub
0cf6105 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - workflow-planning
  - slm
  - lora
  - mlx
  - apple-silicon
  - policy-learning
  - qwen2
  - text-classification
  - contrastive-alignment
  - meta-strengthening
  - ensemble
library_name: mlx
pipeline_tag: text-generation
language:
  - en
datasets:
  - ssaraf1/slm-workflow-planner-policy-v2
  - ssaraf1/slm-workflow-planner-alignment-v2

SLM Workflow Planner 7B v7 — META-Strengthened Alignment (Iter 110)

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v7 model — specialized for JOIN and META detection, trained through 7 stages of progressive alignment from the base policy checkpoint.

Training Lineage

  1. Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
  2. Stage B: Contrastive alignment on 20K samples (iter 100) → v2
  3. Stage C: Fork-suppression alignment on 4.6K samples (iter 200) → v3-best
  4. Stage D: Signal-overlap restoration on 4K samples (80 iters) → v6-100
  5. Stage E: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) → v7-110

Decision Types

Decision Description
NEXT Proceed to the next sequential step
RETRY Retry the current step (within budget)
FORK Launch parallel execution branches
JOIN Synchronize parallel branches
META Escalate — anomaly detected, human intervention needed

Performance (76-scenario evaluation suite)

v7-110 Standalone

Category v7-110 v3-best GPT-4.1
NEXT 17/22 (77%) 12/22 (55%) 6/22 (27%)
RETRY 0/12 (0%) 12/12 (100%) 11/12 (92%)
FORK 1/14 (7%) 14/14 (100%) 14/14 (100%)
JOIN 14/15 (93%) 15/15 (100%) 10/15 (67%)
META 10/13 (77%) 0/13 (0%) 0/13 (0%)
TOTAL 42/76 (55.3%) 53/76 (69.7%) 41/76 (53.9%)

Key Strengths

  • 🔥 META: 77% — only model that detects anomalies (all others at 0%)
  • 🔥 JOIN: 93% — near-perfect synchronization detection
  • 🔥 NEXT: 77% — strong sequential progression

Ensemble with v3-best (Vote)

When combined with v3-best in a vote ensemble:

  • v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
  • v7-110 covers META (77%), NEXT (77%)
  • Combined: significantly better than any individual model

v7 Training Dataset Design

The v7 dataset was specifically designed to address:

  1. META manifold strengthening — quality-filtered META samples with anomaly outcomes
  2. Synthetic anomaly patterns — matching evaluation suite scenarios
  3. Risk-weighted allocation — more samples for high-risk misclassifications (RETRY→NEXT, JOIN→NEXT)
  4. Rehearsal for all classes — broad sampling to prevent catastrophic forgetting

Dataset Distribution

Category Samples %
META 904 28.2%
NEXT 800 25.0%
RETRY 500 15.6%
FORK 500 15.6%
JOIN 500 15.6%
Total 3,204 100%

LoRA Configuration

Parameter Value
Rank 16
Alpha (scale) 32 (2.0x)
Dropout 0.02
Target layers Last 28 of 32
Target modules q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter Value
Framework MLX (Apple Silicon native)
Hardware Apple M4 Pro, 48GB unified memory
Iterations 110 (best val loss)
Batch size 4
Learning rate 1e-5
Sequence length 512
Prompt masking Yes
Resume from v6-100 checkpoint

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
    {"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: META (high uncertainty + anomaly)

Recommended: Ensemble with v3-best

For production use, combine v7-110 with v3-best in a vote ensemble:

  • Use v3-best (ssaraf1/slm-workflow-planner-7b-v3) for RETRY/FORK/JOIN decisions
  • Use v7-110 for META/NEXT decisions
  • Confidence-weighted voting resolves disagreements

Files

  • adapters.safetensors — LoRA adapter weights (v7-110 checkpoint)
  • adapter_config.json — LoRA configuration for MLX

Citation

Part of the Agentic Factory project — building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.