File size: 3,792 Bytes

---
license: apache-2.0
tags:
  - routing
  - code
  - mlx
  - pid
  - cascade
library_name: mlx
---

# Vibe Coding Router v5

A three-tier cascaded router for coding tasks that routes prompts between:

- **Local**: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
- **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
- **Opus**: Claude Opus 4.6 (max-capability cloud)

## What's New in v5

v4 suffered from **inverted routing** — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:

1. **7 new complexity features** (45 handcrafted total): `is_coding_task`, `junk_score`, `scope_breadth`, `imperative_verb_density`, `noun_phrase_density`, `interaction_complexity`, `requirement_clause_count`
2. **Centered complexity premium**: Adjusts training margins by `premium * (complexity_score - center)` so complex tasks push toward cloud and simple tasks push toward local
3. **Junk prompt clamping**: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
4. **Reward weight cap**: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance

## Architecture

Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:

- **Router A** (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
- **Router B** (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU

Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

## Training

- **Data**: 1,644 coding prompts with real quality scores from all three models
- **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
- **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
- **Label smoothing**: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
- **Complexity premium**: 2.0, centered at 0.3
- **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
- **Threshold A**: 0.60 (manually tuned for routing behavior — see note below)
- **Threshold B**: 0.474 (calibrated on validation set)

### Threshold Note

The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.

## Real-World Routing (28 test queries, threshold_a=0.60)

| Category | Local | Sonnet | Opus |
|----------|-------|--------|------|
| Simple (8) | 5 (62%) | 0 | 3 (38%) |
| Medium (8) | 3 (38%) | 0 | 5 (62%) |
| Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) |

v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).

## Test Set Results (calibrated thresholds)

| Metric | Value |
|--------|-------|
| Utility | 0.6205 |
| Oracle Utility | 0.7179 |
| Regret | 0.0973 |

## Files

- `router_a.safetensors` — Router A weights (32×16 MLP, 13KB)
- `router_b.safetensors` — Router B weights (128×64 MLP, 76KB)
- `config.json` — Model config, thresholds, HP, training results
- `scaler.pkl` — StandardScaler for feature normalization
- `embedding_extractor.pkl` — PCA-reduced sentence-transformers extractor
- `sweep_results.json` — Full 108-config HP sweep results

## Usage

```python
from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)
```