sameer-saraf-quant-ai's picture
v8: Stage 2 Context-Contract Planning (MLX LoRA, iter 1000, val=0.032)
d7cca15 verified
---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- workflow-planner
- slm
- lora
- mlx
- context-contract-planning
- tier-2-eligibility
language:
- en
pipeline_tag: text-generation
---
# SLM Workflow Planner v8 β€” Context-Contract Planning (MLX LoRA)
## Overview
**v8** is a Stage 2 enhancement of the SLM Workflow Planner. It extends the v3-best checkpoint with **context-contract planning** β€” the ability to make routing decisions based on the `required_context` and `produces_context` of ALL nodes in a workflow graph, not just directly connected edges.
This enables three new capabilities:
- **Recovery Routing (Backjump):** On failure, jump backward to an earlier context-satisfiable node
- **Stage Skipping:** Skip unnecessary stages when required context is already available (e.g., walk-in customers)
- **Non-Adjacent Parallelism:** Fork two independent context-satisfiable nodes that aren't connected by fork-edges
## Model Details
| Property | Value |
|---|---|
| **Base Model** | Qwen/Qwen2.5-7B-Instruct |
| **Fine-tune Type** | LoRA (MLX format) |
| **LoRA Rank** | 16 |
| **LoRA Scale** | 2.0 |
| **LoRA Dropout** | 0.02 |
| **Tuned Layers** | 28/32 |
| **Trainable Parameters** | 40.37M (0.53%) |
| **Framework** | MLX (Apple Silicon) |
## Training
| Property | Value |
|---|---|
| **Lineage** | base(8000) β†’ v2(100) β†’ v3(200) β†’ v3-cont β†’ v3-best β†’ **v8(1000)** |
| **Resume Checkpoint** | v3-best (59.2% on 76-scenario suite) |
| **Training Iterations** | 1000 (stopped early β€” val loss converged) |
| **Learning Rate** | 2e-5 (cosine decay to 1e-6, 100-step warmup) |
| **Batch Size** | 4 (effective 8 with grad accumulation) |
| **Max Sequence Length** | 768 tokens |
| **Dataset** | 696K samples from 150 workflows |
| **Val Loss** | 0.032 (from 0.272 starting) |
### Training Data Distribution
| Category | Count | % | Description |
|---|---|---|---|
| META | 187K | 26.9% | Dead-end escalation |
| NEGATIVE | 187K | 26.9% | Tier-2 visible but edge chosen ("satisfiable β‰  sensible") |
| NEXT_EDGE | 116K | 16.7% | Normal edge progression |
| NEXT_SKIP πŸ›‘ | 55K | 8.0% | Forward dead-end recovery (Tier-2) |
| RETRY | 36K | 5.2% | Edge retry on failure |
| JOIN | 30K | 4.3% | Parallel branch merge |
| NEXT_BACKJUMP πŸ›‘ | 28K | 4.0% | Failure recovery to earlier node (Tier-2) |
| FORK_EDGE | 28K | 4.0% | Edge-adjacent fork |
| FORK_NONADJ πŸ›‘ | 28K | 4.0% | Non-adjacent parallel fork (Tier-2) |
πŸ›‘ = Protected from downsampling during balancing
## Prompt Format
The model uses a **tiered prompt** with two candidate sections:
```
Current node: NODE_A (SYSTEM, stage 3)
Outcome: success
Failure type: none
State:
goal_progress=0.40
retry_count=0
...
Produced context: {ctx_start, intake_data, assessment_score}
Edge candidates (normal path):
1. NODE_B (AGENT) [processor] β†’ requires: {assessment_score} β†’ produces: {approval}
Context-eligible (off-path, invocable now):
1. NODE_X (SYSTEM, stage 5, gap=+2) [validator] β†’ requires: {intake_data} βœ“ β†’ produces: {validation}
Forkable sets: []
Join-ready: []
What is the best action?
```
**Output format:** `DECISION_TYPE NODE_ID`
- `NEXT NODE_B` β€” advance to NODE_B
- `FORK NODE_A, NODE_B` β€” parallel fork
- `RETRY NODE_A` β€” retry current
- `JOIN NODE_A` β€” merge parallel branches
- `META` β€” escalate to human
## Evaluation Results
### Section A: Stratified Test (100 held-out samples)
| Category | Exact Accuracy | Type Accuracy |
|---|---|---|
| META | 20/20 (100%) | 20/20 (100%) |
| NEGATIVE (Tier-2 visible, edge chosen) | 5/5 (100%) | 5/5 (100%) |
| SKIP_FORWARD | 7/7 (100%) | 7/7 (100%) |
| RETRY | 18/20 (90%) | 18/20 (90%) |
| JOIN | 16/20 (80%) | 16/20 (80%) |
| FORK (non-adjacent) | 12/18 (67%) | 14/18 (78%) |
| NEXT (edge) | 5/8 (63%) | 8/8 (100%) |
| **TOTAL** | **83/100 (83%)** | **88/100 (88%)** |
### Section B: Tier-2 Specific (90 held-out samples)
| Category | Exact Accuracy | Type Accuracy |
|---|---|---|
| Non-Adjacent Fork | 15/15 (100%) | 15/15 (100%) |
| META with Context | 15/15 (100%) | 15/15 (100%) |
| Negative Contrast | 14/15 (93%) | 14/15 (93%) |
| RETRY with Context | 14/15 (93%) | 14/15 (93%) |
| Skip Forward | 13/15 (87%) | 14/15 (93%) |
| JOIN with Context | 10/15 (67%) | 10/15 (67%) |
| **TOTAL** | **81/90 (90%)** | **82/90 (91%)** |
## Key Capabilities
1. **Context-Contract Reasoning:** Evaluates `required_context βŠ† produced_keys` to identify all invocable nodes
2. **Recovery Routing:** Backjumps on process/resource failure when no edge retry exists
3. **Stage Skipping:** Advances to forward context-eligible nodes at dead-ends
4. **Non-Adjacent Parallelism:** Forks independent context-eligible nodes with different actors
5. **Negative Contrast:** Learned "satisfiable β‰  sensible" β€” doesn't take Tier-2 when edge path is correct
## Usage (MLX)
```python
from mlx_lm import load, generate
model, tokenizer = load(
"Qwen/Qwen2.5-7B-Instruct",
adapter_path="sameer-saraf-quant-ai/slm-workflow-planner-v8-mlx"
)
messages = [
{"role": "system", "content": "You are a workflow planner..."},
{"role": "user", "content": "<tiered prompt>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=30)
print(response) # "NEXT ESTIMATION_AND_APPROVAL"
```
## Ensemble Recommendation
For production use, combine with GPT-4.1 arbiter for the ~10% edge cases (mainly JOIN confusion):
- v8 handles 90%+ of decisions autonomously
- GPT validates uncertain decisions (estimated 5-10% of traffic)
## Architecture Context
This adapter is part of the **Agentic OS** system:
- **Temporal** handles durable execution and state management
- **Neo4j** stores workflow graph definitions
- **SLM (this model)** makes real-time routing decisions
- **Guardrails** validate SLM output before execution