--- license: apache-2.0 base_model: Qwen/Qwen2.5-7B-Instruct tags: - workflow-planner - slm - lora - mlx - context-contract-planning - tier-2-eligibility language: - en pipeline_tag: text-generation --- # SLM Workflow Planner v8 — Context-Contract Planning (MLX LoRA) ## Overview **v8** is a Stage 2 enhancement of the SLM Workflow Planner. It extends the v3-best checkpoint with **context-contract planning** — the ability to make routing decisions based on the `required_context` and `produces_context` of ALL nodes in a workflow graph, not just directly connected edges. This enables three new capabilities: - **Recovery Routing (Backjump):** On failure, jump backward to an earlier context-satisfiable node - **Stage Skipping:** Skip unnecessary stages when required context is already available (e.g., walk-in customers) - **Non-Adjacent Parallelism:** Fork two independent context-satisfiable nodes that aren't connected by fork-edges ## Model Details | Property | Value | |---|---| | **Base Model** | Qwen/Qwen2.5-7B-Instruct | | **Fine-tune Type** | LoRA (MLX format) | | **LoRA Rank** | 16 | | **LoRA Scale** | 2.0 | | **LoRA Dropout** | 0.02 | | **Tuned Layers** | 28/32 | | **Trainable Parameters** | 40.37M (0.53%) | | **Framework** | MLX (Apple Silicon) | ## Training | Property | Value | |---|---| | **Lineage** | base(8000) → v2(100) → v3(200) → v3-cont → v3-best → **v8(1000)** | | **Resume Checkpoint** | v3-best (59.2% on 76-scenario suite) | | **Training Iterations** | 1000 (stopped early — val loss converged) | | **Learning Rate** | 2e-5 (cosine decay to 1e-6, 100-step warmup) | | **Batch Size** | 4 (effective 8 with grad accumulation) | | **Max Sequence Length** | 768 tokens | | **Dataset** | 696K samples from 150 workflows | | **Val Loss** | 0.032 (from 0.272 starting) | ### Training Data Distribution | Category | Count | % | Description | |---|---|---|---| | META | 187K | 26.9% | Dead-end escalation | | NEGATIVE | 187K | 26.9% | Tier-2 visible but edge chosen ("satisfiable ≠ sensible") | | NEXT_EDGE | 116K | 16.7% | Normal edge progression | | NEXT_SKIP 🛡 | 55K | 8.0% | Forward dead-end recovery (Tier-2) | | RETRY | 36K | 5.2% | Edge retry on failure | | JOIN | 30K | 4.3% | Parallel branch merge | | NEXT_BACKJUMP 🛡 | 28K | 4.0% | Failure recovery to earlier node (Tier-2) | | FORK_EDGE | 28K | 4.0% | Edge-adjacent fork | | FORK_NONADJ 🛡 | 28K | 4.0% | Non-adjacent parallel fork (Tier-2) | 🛡 = Protected from downsampling during balancing ## Prompt Format The model uses a **tiered prompt** with two candidate sections: ``` Current node: NODE_A (SYSTEM, stage 3) Outcome: success Failure type: none State: goal_progress=0.40 retry_count=0 ... Produced context: {ctx_start, intake_data, assessment_score} Edge candidates (normal path): 1. NODE_B (AGENT) [processor] → requires: {assessment_score} → produces: {approval} Context-eligible (off-path, invocable now): 1. NODE_X (SYSTEM, stage 5, gap=+2) [validator] → requires: {intake_data} ✓ → produces: {validation} Forkable sets: [] Join-ready: [] What is the best action? ``` **Output format:** `DECISION_TYPE NODE_ID` - `NEXT NODE_B` — advance to NODE_B - `FORK NODE_A, NODE_B` — parallel fork - `RETRY NODE_A` — retry current - `JOIN NODE_A` — merge parallel branches - `META` — escalate to human ## Evaluation Results ### Section A: Stratified Test (100 held-out samples) | Category | Exact Accuracy | Type Accuracy | |---|---|---| | META | 20/20 (100%) | 20/20 (100%) | | NEGATIVE (Tier-2 visible, edge chosen) | 5/5 (100%) | 5/5 (100%) | | SKIP_FORWARD | 7/7 (100%) | 7/7 (100%) | | RETRY | 18/20 (90%) | 18/20 (90%) | | JOIN | 16/20 (80%) | 16/20 (80%) | | FORK (non-adjacent) | 12/18 (67%) | 14/18 (78%) | | NEXT (edge) | 5/8 (63%) | 8/8 (100%) | | **TOTAL** | **83/100 (83%)** | **88/100 (88%)** | ### Section B: Tier-2 Specific (90 held-out samples) | Category | Exact Accuracy | Type Accuracy | |---|---|---| | Non-Adjacent Fork | 15/15 (100%) | 15/15 (100%) | | META with Context | 15/15 (100%) | 15/15 (100%) | | Negative Contrast | 14/15 (93%) | 14/15 (93%) | | RETRY with Context | 14/15 (93%) | 14/15 (93%) | | Skip Forward | 13/15 (87%) | 14/15 (93%) | | JOIN with Context | 10/15 (67%) | 10/15 (67%) | | **TOTAL** | **81/90 (90%)** | **82/90 (91%)** | ## Key Capabilities 1. **Context-Contract Reasoning:** Evaluates `required_context ⊆ produced_keys` to identify all invocable nodes 2. **Recovery Routing:** Backjumps on process/resource failure when no edge retry exists 3. **Stage Skipping:** Advances to forward context-eligible nodes at dead-ends 4. **Non-Adjacent Parallelism:** Forks independent context-eligible nodes with different actors 5. **Negative Contrast:** Learned "satisfiable ≠ sensible" — doesn't take Tier-2 when edge path is correct ## Usage (MLX) ```python from mlx_lm import load, generate model, tokenizer = load( "Qwen/Qwen2.5-7B-Instruct", adapter_path="sameer-saraf-quant-ai/slm-workflow-planner-v8-mlx" ) messages = [ {"role": "system", "content": "You are a workflow planner..."}, {"role": "user", "content": ""}, ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) response = generate(model, tokenizer, prompt=prompt, max_tokens=30) print(response) # "NEXT ESTIMATION_AND_APPROVAL" ``` ## Ensemble Recommendation For production use, combine with GPT-4.1 arbiter for the ~10% edge cases (mainly JOIN confusion): - v8 handles 90%+ of decisions autonomously - GPT validates uncertain decisions (estimated 5-10% of traffic) ## Architecture Context This adapter is part of the **Agentic OS** system: - **Temporal** handles durable execution and state management - **Neo4j** stores workflow graph definitions - **SLM (this model)** makes real-time routing decisions - **Guardrails** validate SLM output before execution