File size: 3,792 Bytes
68df8f3 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 626e786 68df8f3 e340d72 626e786 e340d72 68df8f3 e340d72 626e786 e340d72 626e786 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 e340d72 68df8f3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ---
license: apache-2.0
tags:
- routing
- code
- mlx
- pid
- cascade
library_name: mlx
---
# Vibe Coding Router v5
A three-tier cascaded router for coding tasks that routes prompts between:
- **Local**: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
- **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
- **Opus**: Claude Opus 4.6 (max-capability cloud)
## What's New in v5
v4 suffered from **inverted routing** — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:
1. **7 new complexity features** (45 handcrafted total): `is_coding_task`, `junk_score`, `scope_breadth`, `imperative_verb_density`, `noun_phrase_density`, `interaction_complexity`, `requirement_clause_count`
2. **Centered complexity premium**: Adjusts training margins by `premium * (complexity_score - center)` so complex tasks push toward cloud and simple tasks push toward local
3. **Junk prompt clamping**: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
4. **Reward weight cap**: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance
## Architecture
Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:
- **Router A** (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
- **Router B** (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU
Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
## Training
- **Data**: 1,644 coding prompts with real quality scores from all three models
- **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
- **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
- **Label smoothing**: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
- **Complexity premium**: 2.0, centered at 0.3
- **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
- **Threshold A**: 0.60 (manually tuned for routing behavior — see note below)
- **Threshold B**: 0.474 (calibrated on validation set)
### Threshold Note
The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.
## Real-World Routing (28 test queries, threshold_a=0.60)
| Category | Local | Sonnet | Opus |
|----------|-------|--------|------|
| Simple (8) | 5 (62%) | 0 | 3 (38%) |
| Medium (8) | 3 (38%) | 0 | 5 (62%) |
| Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) |
v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).
## Test Set Results (calibrated thresholds)
| Metric | Value |
|--------|-------|
| Utility | 0.6205 |
| Oracle Utility | 0.7179 |
| Regret | 0.0973 |
## Files
- `router_a.safetensors` — Router A weights (32×16 MLP, 13KB)
- `router_b.safetensors` — Router B weights (128×64 MLP, 76KB)
- `config.json` — Model config, thresholds, HP, training results
- `scaler.pkl` — StandardScaler for feature normalization
- `embedding_extractor.pkl` — PCA-reduced sentence-transformers extractor
- `sweep_results.json` — Full 108-config HP sweep results
## Usage
```python
from router.three_tier_inference import ThreeTierRouter
router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)
```
|