Upload README.md with huggingface_hub

0cf6105 verified 2 months ago

5.32 kB

license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - workflow-planning
  - slm
  - lora
  - mlx
  - apple-silicon
  - policy-learning
  - qwen2
  - text-classification
  - contrastive-alignment
  - meta-strengthening
  - ensemble
library_name: mlx
pipeline_tag: text-generation
language:
  - en
datasets:
  - ssaraf1/slm-workflow-planner-policy-v2
  - ssaraf1/slm-workflow-planner-alignment-v2

SLM Workflow Planner 7B v7 — META-Strengthened Alignment (Iter 110)

Model Description

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v7 model — specialized for JOIN and META detection, trained through 7 stages of progressive alignment from the base policy checkpoint.

Training Lineage

Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
Stage B: Contrastive alignment on 20K samples (iter 100) → v2
Stage C: Fork-suppression alignment on 4.6K samples (iter 200) → v3-best
Stage D: Signal-overlap restoration on 4K samples (80 iters) → v6-100
Stage E: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) → v7-110

Decision Types

Decision	Description
NEXT	Proceed to the next sequential step
RETRY	Retry the current step (within budget)
FORK	Launch parallel execution branches
JOIN	Synchronize parallel branches
META	Escalate — anomaly detected, human intervention needed

Performance (76-scenario evaluation suite)

v7-110 Standalone

Category	v7-110	v3-best	GPT-4.1
NEXT	17/22 (77%)	12/22 (55%)	6/22 (27%)
RETRY	0/12 (0%)	12/12 (100%)	11/12 (92%)
FORK	1/14 (7%)	14/14 (100%)	14/14 (100%)
JOIN	14/15 (93%)	15/15 (100%)	10/15 (67%)
META	10/13 (77%)	0/13 (0%)	0/13 (0%)
TOTAL	42/76 (55.3%)	53/76 (69.7%)	41/76 (53.9%)

Key Strengths

🔥 META: 77% — only model that detects anomalies (all others at 0%)
🔥 JOIN: 93% — near-perfect synchronization detection
🔥 NEXT: 77% — strong sequential progression

Ensemble with v3-best (Vote)

When combined with v3-best in a vote ensemble:

v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
v7-110 covers META (77%), NEXT (77%)
Combined: significantly better than any individual model

v7 Training Dataset Design

The v7 dataset was specifically designed to address:

META manifold strengthening — quality-filtered META samples with anomaly outcomes
Synthetic anomaly patterns — matching evaluation suite scenarios
Risk-weighted allocation — more samples for high-risk misclassifications (RETRY→NEXT, JOIN→NEXT)
Rehearsal for all classes — broad sampling to prevent catastrophic forgetting

Dataset Distribution

Category	Samples	%
META	904	28.2%
NEXT	800	25.0%
RETRY	500	15.6%
FORK	500	15.6%
JOIN	500	15.6%
Total	3,204	100%

LoRA Configuration

Parameter	Value
Rank	16
Alpha (scale)	32 (2.0x)
Dropout	0.02
Target layers	Last 28 of 32
Target modules	q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter	Value
Framework	MLX (Apple Silicon native)
Hardware	Apple M4 Pro, 48GB unified memory
Iterations	110 (best val loss)
Batch size	4
Learning rate	1e-5
Sequence length	512
Prompt masking	Yes
Resume from	v6-100 checkpoint

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
)

messages = [
    {"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
    {"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response)  # Expected: META (high uncertainty + anomaly)

Recommended: Ensemble with v3-best

For production use, combine v7-110 with v3-best in a vote ensemble:

Use v3-best (ssaraf1/slm-workflow-planner-7b-v3) for RETRY/FORK/JOIN decisions
Use v7-110 for META/NEXT decisions
Confidence-weighted voting resolves disagreements

Files

adapters.safetensors — LoRA adapter weights (v7-110 checkpoint)
adapter_config.json — LoRA configuration for MLX

Citation

Part of the Agentic Factory project — building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.