SLM Workflow Planner 7B v7 β META-Strengthened Alignment (Iter 110)
Model Description
LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned as a workflow execution planner. This is the v7 model β specialized for JOIN and META detection, trained through 7 stages of progressive alignment from the base policy checkpoint.
Training Lineage
- Stage A: Base policy training on 554K samples from 89 diverse workflow graphs (iter 800)
- Stage B: Contrastive alignment on 20K samples (iter 100) β v2
- Stage C: Fork-suppression alignment on 4.6K samples (iter 200) β v3-best
- Stage D: Signal-overlap restoration on 4K samples (80 iters) β v6-100
- Stage E: META-strengthened + risk-weighted alignment on 3.2K samples (110 iters) β v7-110
Decision Types
| Decision | Description |
|---|---|
| NEXT | Proceed to the next sequential step |
| RETRY | Retry the current step (within budget) |
| FORK | Launch parallel execution branches |
| JOIN | Synchronize parallel branches |
| META | Escalate β anomaly detected, human intervention needed |
Performance (76-scenario evaluation suite)
v7-110 Standalone
| Category | v7-110 | v3-best | GPT-4.1 |
|---|---|---|---|
| NEXT | 17/22 (77%) | 12/22 (55%) | 6/22 (27%) |
| RETRY | 0/12 (0%) | 12/12 (100%) | 11/12 (92%) |
| FORK | 1/14 (7%) | 14/14 (100%) | 14/14 (100%) |
| JOIN | 14/15 (93%) | 15/15 (100%) | 10/15 (67%) |
| META | 10/13 (77%) | 0/13 (0%) | 0/13 (0%) |
| TOTAL | 42/76 (55.3%) | 53/76 (69.7%) | 41/76 (53.9%) |
Key Strengths
- π₯ META: 77% β only model that detects anomalies (all others at 0%)
- π₯ JOIN: 93% β near-perfect synchronization detection
- π₯ NEXT: 77% β strong sequential progression
Ensemble with v3-best (Vote)
When combined with v3-best in a vote ensemble:
- v3-best covers RETRY (100%), FORK (100%), JOIN (100%)
- v7-110 covers META (77%), NEXT (77%)
- Combined: significantly better than any individual model
v7 Training Dataset Design
The v7 dataset was specifically designed to address:
- META manifold strengthening β quality-filtered META samples with anomaly outcomes
- Synthetic anomaly patterns β matching evaluation suite scenarios
- Risk-weighted allocation β more samples for high-risk misclassifications (RETRYβNEXT, JOINβNEXT)
- Rehearsal for all classes β broad sampling to prevent catastrophic forgetting
Dataset Distribution
| Category | Samples | % |
|---|---|---|
| META | 904 | 28.2% |
| NEXT | 800 | 25.0% |
| RETRY | 500 | 15.6% |
| FORK | 500 | 15.6% |
| JOIN | 500 | 15.6% |
| Total | 3,204 | 100% |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank | 16 |
| Alpha (scale) | 32 (2.0x) |
| Dropout | 0.02 |
| Target layers | Last 28 of 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
Training Configuration
| Parameter | Value |
|---|---|
| Framework | MLX (Apple Silicon native) |
| Hardware | Apple M4 Pro, 48GB unified memory |
| Iterations | 110 (best val loss) |
| Batch size | 4 |
| Learning rate | 1e-5 |
| Sequence length | 512 |
| Prompt masking | Yes |
| Resume from | v6-100 checkpoint |
Usage
With MLX (Apple Silicon)
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load(
"Qwen/Qwen2.5-7B-Instruct",
adapter_path="ssaraf1/slm-workflow-planner-7b-v7"
)
messages = [
{"role": "system", "content": "You are a workflow planner. Given the current workflow state, eligible nodes, and topology information, classify the decision type. Respond with exactly one of: NEXT, RETRY, FORK, JOIN, META"},
{"role": "user", "content": "Current node: VERIFY_CLAIM\nOutcome: anomaly_detected\nState: goal_progress=0.15 | uncertainty=0.85 | retry_count=3\nEligible: [ESCALATE, MANUAL_REVIEW]\nForkable sets: none\nJoin-ready: False\nWhat decision type?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampler = make_sampler(temp=0.0)
response = generate(model, tokenizer, prompt=prompt, max_tokens=10, sampler=sampler)
print(response) # Expected: META (high uncertainty + anomaly)
Recommended: Ensemble with v3-best
For production use, combine v7-110 with v3-best in a vote ensemble:
- Use v3-best (
ssaraf1/slm-workflow-planner-7b-v3) for RETRY/FORK/JOIN decisions - Use v7-110 for META/NEXT decisions
- Confidence-weighted voting resolves disagreements
Files
adapters.safetensorsβ LoRA adapter weights (v7-110 checkpoint)adapter_config.jsonβ LoRA configuration for MLX
Citation
Part of the Agentic Factory project β building autonomous workflow orchestration with SLM-powered planning on Apple Silicon.
Hardware compatibility
Log In to add your hardware
Quantized