Update to v2: 50K pairwise-labeled dataset, dual-judge, temperature scaling
Browse files- README.md +25 -31
- model.safetensors +1 -1
- router_config.json +9 -5
- sweep_results.json +12 -12
README.md
CHANGED
|
@@ -17,7 +17,7 @@ language:
|
|
| 17 |
- en
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# Vibe Router — ModernBERT
|
| 21 |
|
| 22 |
A tiny LLM router that decides whether a chat request should run **locally** (on-device) or in the **cloud**, built on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
|
| 23 |
|
|
@@ -28,13 +28,20 @@ Given a user prompt, the model outputs a single logit. After sigmoid, values abo
|
|
| 28 |
- **Device model**: [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) (runs locally via MLX)
|
| 29 |
- **Cloud model**: GPT-5.2
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Training
|
| 32 |
|
| 33 |
-
Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Information Distillation (PID)** loss on
|
| 34 |
|
| 35 |
| Hyperparameter | Value |
|
| 36 |
|----------------|-------|
|
| 37 |
-
| Learning rate |
|
| 38 |
| β_kl | 0.05 |
|
| 39 |
| Weight decay | 0.01 |
|
| 40 |
| Warmup ratio | 0.1 |
|
|
@@ -44,22 +51,22 @@ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Info
|
|
| 44 |
|
| 45 |
## Performance
|
| 46 |
|
| 47 |
-
| Metric |
|
| 48 |
-
|--------|-------|
|
| 49 |
-
| Utility | 0.9762 |
|
| 50 |
-
| Cloud rate | 79.4% |
|
| 51 |
-
| Regret | 0.0064 |
|
| 52 |
-
| Catastrophic miss rate | 0.0% |
|
| 53 |
-
| ECE | 0.173 |
|
| 54 |
-
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
| Always cloud | 0.894 | 100% | 0.089 |
|
| 62 |
-
| **ModernBERT (PID)** | **0.976** | **79.4%** | **0.006** |
|
| 63 |
|
| 64 |
## Latency
|
| 65 |
|
|
@@ -71,7 +78,7 @@ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Info
|
|
| 71 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 72 |
import torch
|
| 73 |
|
| 74 |
-
model_id = "
|
| 75 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 76 |
model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=1)
|
| 77 |
model.eval()
|
|
@@ -88,19 +95,6 @@ decision = "cloud" if p_cloud > threshold else "device"
|
|
| 88 |
print(f"p(cloud)={p_cloud:.3f} → {decision}")
|
| 89 |
```
|
| 90 |
|
| 91 |
-
## Routing examples
|
| 92 |
-
|
| 93 |
-
| Prompt | p(cloud) | Decision |
|
| 94 |
-
|--------|----------|----------|
|
| 95 |
-
| hi | 0.011 | device |
|
| 96 |
-
| 2+2 | 0.009 | device |
|
| 97 |
-
| tell me a joke | 0.012 | device |
|
| 98 |
-
| hello | 0.011 | device |
|
| 99 |
-
| Write a Python B-tree with insert, delete, search | 0.911 | cloud |
|
| 100 |
-
| Implement a REST API with auth and rate limiting | 0.762 | cloud |
|
| 101 |
-
| Derive the volume of a sphere using integration | 0.900 | cloud |
|
| 102 |
-
| Who was the first host of Top Chef? | 0.946 | cloud |
|
| 103 |
-
|
| 104 |
## License
|
| 105 |
|
| 106 |
Apache 2.0
|
|
|
|
| 17 |
- en
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# Vibe Router — ModernBERT v2
|
| 21 |
|
| 22 |
A tiny LLM router that decides whether a chat request should run **locally** (on-device) or in the **cloud**, built on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
|
| 23 |
|
|
|
|
| 28 |
- **Device model**: [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) (runs locally via MLX)
|
| 29 |
- **Cloud model**: GPT-5.2
|
| 30 |
|
| 31 |
+
## Changes from v1
|
| 32 |
+
|
| 33 |
+
- **10x more data**: 50K prompts from LMSYS-Chat-1M, WildChat-1M, UltraChat, OpenAssistant, Alpaca, No Robots (v1 used 5.3K)
|
| 34 |
+
- **Pairwise judging**: Dual-judge system (GPT-4o + Claude Sonnet 4) with randomized presentation order, replacing single-judge absolute scoring
|
| 35 |
+
- **Temperature scaling**: Post-training calibration for well-calibrated probabilities
|
| 36 |
+
- **GPU device inference**: Device model (LFM2.5-1.2B) run on H100 GPU via HuggingFace transformers instead of local API
|
| 37 |
+
|
| 38 |
## Training
|
| 39 |
|
| 40 |
+
Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Information Distillation (PID)** loss on 50K labeled prompt pairs with soft teacher labels derived from pairwise dual-judge comparison.
|
| 41 |
|
| 42 |
| Hyperparameter | Value |
|
| 43 |
|----------------|-------|
|
| 44 |
+
| Learning rate | 5e-5 |
|
| 45 |
| β_kl | 0.05 |
|
| 46 |
| Weight decay | 0.01 |
|
| 47 |
| Warmup ratio | 0.1 |
|
|
|
|
| 51 |
|
| 52 |
## Performance
|
| 53 |
|
| 54 |
+
| Metric | v2 | v1 |
|
| 55 |
+
|--------|-----|-----|
|
| 56 |
+
| Utility | 0.8734 | 0.9762 |
|
| 57 |
+
| Cloud rate | 80.3% | 79.4% |
|
| 58 |
+
| Regret | 0.1119 | 0.0064 |
|
| 59 |
+
| Catastrophic miss rate | 5.1% | 0.0% |
|
| 60 |
+
| ECE (uncalibrated) | 0.026 | 0.173 |
|
| 61 |
+
| ECE (calibrated) | 0.028 | — |
|
| 62 |
+
| Temperature (T) | 1.083 | — |
|
| 63 |
+
| Best threshold | 0.371 | 0.371 |
|
| 64 |
|
| 65 |
+
**Note**: v1 and v2 utility/regret metrics are not directly comparable — v1 used absolute quality scores (0-1) while v2 uses pairwise win rates, which produces a different scale. ECE improved dramatically (0.173 → 0.026).
|
| 66 |
|
| 67 |
+
### Known issue
|
| 68 |
+
|
| 69 |
+
~19% of cloud model (GPT-5.2) responses were empty in the training data, which caused those prompts to be incorrectly labeled as "device-preferred". This introduced a bias in the routing logic. A v3 retraining with filtered data is planned.
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## Latency
|
| 72 |
|
|
|
|
| 78 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 79 |
import torch
|
| 80 |
|
| 81 |
+
model_id = "darkolorin/vibe-router-modernbert"
|
| 82 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 83 |
model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=1)
|
| 84 |
model.eval()
|
|
|
|
| 95 |
print(f"p(cloud)={p_cloud:.3f} → {decision}")
|
| 96 |
```
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
## License
|
| 99 |
|
| 100 |
Apache 2.0
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 598436708
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:463d9c4384baa384785ea5b4545410ed6f1fe1a116748717263adccf7d6022a1
|
| 3 |
size 598436708
|
router_config.json
CHANGED
|
@@ -1,9 +1,11 @@
|
|
| 1 |
{
|
| 2 |
"base_model": "answerdotai/ModernBERT-base",
|
| 3 |
"best_threshold": 0.37105263157894736,
|
|
|
|
|
|
|
| 4 |
"loss": "PID",
|
| 5 |
"hp": {
|
| 6 |
-
"lr":
|
| 7 |
"beta_kl": 0.05,
|
| 8 |
"weight_decay": 0.01,
|
| 9 |
"warmup_ratio": 0.1
|
|
@@ -11,9 +13,11 @@
|
|
| 11 |
"device_model": "LiquidAI/LFM2.5-1.2B-Instruct",
|
| 12 |
"cloud_model": "gpt-5.2",
|
| 13 |
"test_results": {
|
| 14 |
-
"utility": 0.
|
| 15 |
-
"cloud_rate": 0.
|
| 16 |
-
"regret": 0.
|
| 17 |
-
"cat_miss": 0.
|
|
|
|
|
|
|
| 18 |
}
|
| 19 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"base_model": "answerdotai/ModernBERT-base",
|
| 3 |
"best_threshold": 0.37105263157894736,
|
| 4 |
+
"temperature": 1.0832775919732442,
|
| 5 |
+
"calibrated_threshold": 0.37105263157894736,
|
| 6 |
"loss": "PID",
|
| 7 |
"hp": {
|
| 8 |
+
"lr": 5e-05,
|
| 9 |
"beta_kl": 0.05,
|
| 10 |
"weight_decay": 0.01,
|
| 11 |
"warmup_ratio": 0.1
|
|
|
|
| 13 |
"device_model": "LiquidAI/LFM2.5-1.2B-Instruct",
|
| 14 |
"cloud_model": "gpt-5.2",
|
| 15 |
"test_results": {
|
| 16 |
+
"utility": 0.8734409213066101,
|
| 17 |
+
"cloud_rate": 0.8029333333333334,
|
| 18 |
+
"regret": 0.1118897870182991,
|
| 19 |
+
"cat_miss": 0.0508,
|
| 20 |
+
"ece_uncalibrated": 0.025725143598516808,
|
| 21 |
+
"ece_calibrated": 0.027517356761029774
|
| 22 |
}
|
| 23 |
}
|
sweep_results.json
CHANGED
|
@@ -6,8 +6,8 @@
|
|
| 6 |
"weight_decay": 0.01,
|
| 7 |
"warmup_ratio": 0.1
|
| 8 |
},
|
| 9 |
-
"val_loss": 0.
|
| 10 |
-
"time_s":
|
| 11 |
},
|
| 12 |
{
|
| 13 |
"hp": {
|
|
@@ -16,8 +16,8 @@
|
|
| 16 |
"weight_decay": 0.01,
|
| 17 |
"warmup_ratio": 0.1
|
| 18 |
},
|
| 19 |
-
"val_loss": 0.
|
| 20 |
-
"time_s":
|
| 21 |
},
|
| 22 |
{
|
| 23 |
"hp": {
|
|
@@ -26,8 +26,8 @@
|
|
| 26 |
"weight_decay": 0.01,
|
| 27 |
"warmup_ratio": 0.1
|
| 28 |
},
|
| 29 |
-
"val_loss": 0.
|
| 30 |
-
"time_s":
|
| 31 |
},
|
| 32 |
{
|
| 33 |
"hp": {
|
|
@@ -36,8 +36,8 @@
|
|
| 36 |
"weight_decay": 0.01,
|
| 37 |
"warmup_ratio": 0.1
|
| 38 |
},
|
| 39 |
-
"val_loss": 0.
|
| 40 |
-
"time_s":
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"hp": {
|
|
@@ -46,8 +46,8 @@
|
|
| 46 |
"weight_decay": 0.01,
|
| 47 |
"warmup_ratio": 0.1
|
| 48 |
},
|
| 49 |
-
"val_loss": 0.
|
| 50 |
-
"time_s":
|
| 51 |
},
|
| 52 |
{
|
| 53 |
"hp": {
|
|
@@ -56,7 +56,7 @@
|
|
| 56 |
"weight_decay": 0.01,
|
| 57 |
"warmup_ratio": 0.1
|
| 58 |
},
|
| 59 |
-
"val_loss": 0.
|
| 60 |
-
"time_s":
|
| 61 |
}
|
| 62 |
]
|
|
|
|
| 6 |
"weight_decay": 0.01,
|
| 7 |
"warmup_ratio": 0.1
|
| 8 |
},
|
| 9 |
+
"val_loss": 0.359864952657737,
|
| 10 |
+
"time_s": 2119.8089495899912
|
| 11 |
},
|
| 12 |
{
|
| 13 |
"hp": {
|
|
|
|
| 16 |
"weight_decay": 0.01,
|
| 17 |
"warmup_ratio": 0.1
|
| 18 |
},
|
| 19 |
+
"val_loss": 0.39046762092440734,
|
| 20 |
+
"time_s": 2135.0269561820023
|
| 21 |
},
|
| 22 |
{
|
| 23 |
"hp": {
|
|
|
|
| 26 |
"weight_decay": 0.01,
|
| 27 |
"warmup_ratio": 0.1
|
| 28 |
},
|
| 29 |
+
"val_loss": 0.3433239631793078,
|
| 30 |
+
"time_s": 1772.8355041669856
|
| 31 |
},
|
| 32 |
{
|
| 33 |
"hp": {
|
|
|
|
| 36 |
"weight_decay": 0.01,
|
| 37 |
"warmup_ratio": 0.1
|
| 38 |
},
|
| 39 |
+
"val_loss": 0.3631987929577921,
|
| 40 |
+
"time_s": 1771.8289752060082
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"hp": {
|
|
|
|
| 46 |
"weight_decay": 0.01,
|
| 47 |
"warmup_ratio": 0.1
|
| 48 |
},
|
| 49 |
+
"val_loss": 0.33285403746015885,
|
| 50 |
+
"time_s": 1769.0101560780022
|
| 51 |
},
|
| 52 |
{
|
| 53 |
"hp": {
|
|
|
|
| 56 |
"weight_decay": 0.01,
|
| 57 |
"warmup_ratio": 0.1
|
| 58 |
},
|
| 59 |
+
"val_loss": 0.36301534577911976,
|
| 60 |
+
"time_s": 1770.7788193659944
|
| 61 |
}
|
| 62 |
]
|