darkolorin commited on
Commit
09aaf73
·
verified ·
1 Parent(s): 2d9cea5

Update to v2: 50K pairwise-labeled dataset, dual-judge, temperature scaling

Browse files
Files changed (4) hide show
  1. README.md +25 -31
  2. model.safetensors +1 -1
  3. router_config.json +9 -5
  4. sweep_results.json +12 -12
README.md CHANGED
@@ -17,7 +17,7 @@ language:
17
  - en
18
  ---
19
 
20
- # Vibe Router — ModernBERT
21
 
22
  A tiny LLM router that decides whether a chat request should run **locally** (on-device) or in the **cloud**, built on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
23
 
@@ -28,13 +28,20 @@ Given a user prompt, the model outputs a single logit. After sigmoid, values abo
28
  - **Device model**: [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) (runs locally via MLX)
29
  - **Cloud model**: GPT-5.2
30
 
 
 
 
 
 
 
 
31
  ## Training
32
 
33
- Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Information Distillation (PID)** loss on 5,318 labeled prompt pairs with soft teacher labels derived from a GPT-4o judge.
34
 
35
  | Hyperparameter | Value |
36
  |----------------|-------|
37
- | Learning rate | 2e-5 |
38
  | β_kl | 0.05 |
39
  | Weight decay | 0.01 |
40
  | Warmup ratio | 0.1 |
@@ -44,22 +51,22 @@ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Info
44
 
45
  ## Performance
46
 
47
- | Metric | Value |
48
- |--------|-------|
49
- | Utility | 0.9762 |
50
- | Cloud rate | 79.4% |
51
- | Regret | 0.0064 |
52
- | Catastrophic miss rate | 0.0% |
53
- | ECE | 0.173 |
54
- | Best threshold | 0.371 |
 
 
55
 
56
- ### Baselines
57
 
58
- | Model | Utility | Cloud% | Regret |
59
- |-------|---------|--------|--------|
60
- | Always device | 0.879 | 0% | 0.104 |
61
- | Always cloud | 0.894 | 100% | 0.089 |
62
- | **ModernBERT (PID)** | **0.976** | **79.4%** | **0.006** |
63
 
64
  ## Latency
65
 
@@ -71,7 +78,7 @@ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Info
71
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
72
  import torch
73
 
74
- model_id = "trymirai/vibe-router-modernbert"
75
  tokenizer = AutoTokenizer.from_pretrained(model_id)
76
  model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=1)
77
  model.eval()
@@ -88,19 +95,6 @@ decision = "cloud" if p_cloud > threshold else "device"
88
  print(f"p(cloud)={p_cloud:.3f} → {decision}")
89
  ```
90
 
91
- ## Routing examples
92
-
93
- | Prompt | p(cloud) | Decision |
94
- |--------|----------|----------|
95
- | hi | 0.011 | device |
96
- | 2+2 | 0.009 | device |
97
- | tell me a joke | 0.012 | device |
98
- | hello | 0.011 | device |
99
- | Write a Python B-tree with insert, delete, search | 0.911 | cloud |
100
- | Implement a REST API with auth and rate limiting | 0.762 | cloud |
101
- | Derive the volume of a sphere using integration | 0.900 | cloud |
102
- | Who was the first host of Top Chef? | 0.946 | cloud |
103
-
104
  ## License
105
 
106
  Apache 2.0
 
17
  - en
18
  ---
19
 
20
+ # Vibe Router — ModernBERT v2
21
 
22
  A tiny LLM router that decides whether a chat request should run **locally** (on-device) or in the **cloud**, built on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
23
 
 
28
  - **Device model**: [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) (runs locally via MLX)
29
  - **Cloud model**: GPT-5.2
30
 
31
+ ## Changes from v1
32
+
33
+ - **10x more data**: 50K prompts from LMSYS-Chat-1M, WildChat-1M, UltraChat, OpenAssistant, Alpaca, No Robots (v1 used 5.3K)
34
+ - **Pairwise judging**: Dual-judge system (GPT-4o + Claude Sonnet 4) with randomized presentation order, replacing single-judge absolute scoring
35
+ - **Temperature scaling**: Post-training calibration for well-calibrated probabilities
36
+ - **GPU device inference**: Device model (LFM2.5-1.2B) run on H100 GPU via HuggingFace transformers instead of local API
37
+
38
  ## Training
39
 
40
+ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Information Distillation (PID)** loss on 50K labeled prompt pairs with soft teacher labels derived from pairwise dual-judge comparison.
41
 
42
  | Hyperparameter | Value |
43
  |----------------|-------|
44
+ | Learning rate | 5e-5 |
45
  | β_kl | 0.05 |
46
  | Weight decay | 0.01 |
47
  | Warmup ratio | 0.1 |
 
51
 
52
  ## Performance
53
 
54
+ | Metric | v2 | v1 |
55
+ |--------|-----|-----|
56
+ | Utility | 0.8734 | 0.9762 |
57
+ | Cloud rate | 80.3% | 79.4% |
58
+ | Regret | 0.1119 | 0.0064 |
59
+ | Catastrophic miss rate | 5.1% | 0.0% |
60
+ | ECE (uncalibrated) | 0.026 | 0.173 |
61
+ | ECE (calibrated) | 0.028 | — |
62
+ | Temperature (T) | 1.083 | — |
63
+ | Best threshold | 0.371 | 0.371 |
64
 
65
+ **Note**: v1 and v2 utility/regret metrics are not directly comparable — v1 used absolute quality scores (0-1) while v2 uses pairwise win rates, which produces a different scale. ECE improved dramatically (0.173 → 0.026).
66
 
67
+ ### Known issue
68
+
69
+ ~19% of cloud model (GPT-5.2) responses were empty in the training data, which caused those prompts to be incorrectly labeled as "device-preferred". This introduced a bias in the routing logic. A v3 retraining with filtered data is planned.
 
 
70
 
71
  ## Latency
72
 
 
78
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
79
  import torch
80
 
81
+ model_id = "darkolorin/vibe-router-modernbert"
82
  tokenizer = AutoTokenizer.from_pretrained(model_id)
83
  model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=1)
84
  model.eval()
 
95
  print(f"p(cloud)={p_cloud:.3f} → {decision}")
96
  ```
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  ## License
99
 
100
  Apache 2.0
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb128103dab9e2938e447b4079d4f0bb3034e2f26cfd2668159f37aeaa54f67f
3
  size 598436708
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:463d9c4384baa384785ea5b4545410ed6f1fe1a116748717263adccf7d6022a1
3
  size 598436708
router_config.json CHANGED
@@ -1,9 +1,11 @@
1
  {
2
  "base_model": "answerdotai/ModernBERT-base",
3
  "best_threshold": 0.37105263157894736,
 
 
4
  "loss": "PID",
5
  "hp": {
6
- "lr": 2e-05,
7
  "beta_kl": 0.05,
8
  "weight_decay": 0.01,
9
  "warmup_ratio": 0.1
@@ -11,9 +13,11 @@
11
  "device_model": "LiquidAI/LFM2.5-1.2B-Instruct",
12
  "cloud_model": "gpt-5.2",
13
  "test_results": {
14
- "utility": 0.9762406349182129,
15
- "cloud_rate": 0.7944862155388471,
16
- "regret": 0.006434837356209755,
17
- "cat_miss": 0.0
 
 
18
  }
19
  }
 
1
  {
2
  "base_model": "answerdotai/ModernBERT-base",
3
  "best_threshold": 0.37105263157894736,
4
+ "temperature": 1.0832775919732442,
5
+ "calibrated_threshold": 0.37105263157894736,
6
  "loss": "PID",
7
  "hp": {
8
+ "lr": 5e-05,
9
  "beta_kl": 0.05,
10
  "weight_decay": 0.01,
11
  "warmup_ratio": 0.1
 
13
  "device_model": "LiquidAI/LFM2.5-1.2B-Instruct",
14
  "cloud_model": "gpt-5.2",
15
  "test_results": {
16
+ "utility": 0.8734409213066101,
17
+ "cloud_rate": 0.8029333333333334,
18
+ "regret": 0.1118897870182991,
19
+ "cat_miss": 0.0508,
20
+ "ece_uncalibrated": 0.025725143598516808,
21
+ "ece_calibrated": 0.027517356761029774
22
  }
23
  }
sweep_results.json CHANGED
@@ -6,8 +6,8 @@
6
  "weight_decay": 0.01,
7
  "warmup_ratio": 0.1
8
  },
9
- "val_loss": 0.05074503788000751,
10
- "time_s": 95.0573191291187
11
  },
12
  {
13
  "hp": {
@@ -16,8 +16,8 @@
16
  "weight_decay": 0.01,
17
  "warmup_ratio": 0.1
18
  },
19
- "val_loss": 0.0569811669310373,
20
- "time_s": 107.19165365281515
21
  },
22
  {
23
  "hp": {
@@ -26,8 +26,8 @@
26
  "weight_decay": 0.01,
27
  "warmup_ratio": 0.1
28
  },
29
- "val_loss": 0.04958628546137836,
30
- "time_s": 106.77600225992501
31
  },
32
  {
33
  "hp": {
@@ -36,8 +36,8 @@
36
  "weight_decay": 0.01,
37
  "warmup_ratio": 0.1
38
  },
39
- "val_loss": 0.05651537539578055,
40
- "time_s": 145.05425760895014
41
  },
42
  {
43
  "hp": {
@@ -46,8 +46,8 @@
46
  "weight_decay": 0.01,
47
  "warmup_ratio": 0.1
48
  },
49
- "val_loss": 0.04995061208804449,
50
- "time_s": 89.70805354882032
51
  },
52
  {
53
  "hp": {
@@ -56,7 +56,7 @@
56
  "weight_decay": 0.01,
57
  "warmup_ratio": 0.1
58
  },
59
- "val_loss": 0.05411159153170129,
60
- "time_s": 125.99459161888808
61
  }
62
  ]
 
6
  "weight_decay": 0.01,
7
  "warmup_ratio": 0.1
8
  },
9
+ "val_loss": 0.359864952657737,
10
+ "time_s": 2119.8089495899912
11
  },
12
  {
13
  "hp": {
 
16
  "weight_decay": 0.01,
17
  "warmup_ratio": 0.1
18
  },
19
+ "val_loss": 0.39046762092440734,
20
+ "time_s": 2135.0269561820023
21
  },
22
  {
23
  "hp": {
 
26
  "weight_decay": 0.01,
27
  "warmup_ratio": 0.1
28
  },
29
+ "val_loss": 0.3433239631793078,
30
+ "time_s": 1772.8355041669856
31
  },
32
  {
33
  "hp": {
 
36
  "weight_decay": 0.01,
37
  "warmup_ratio": 0.1
38
  },
39
+ "val_loss": 0.3631987929577921,
40
+ "time_s": 1771.8289752060082
41
  },
42
  {
43
  "hp": {
 
46
  "weight_decay": 0.01,
47
  "warmup_ratio": 0.1
48
  },
49
+ "val_loss": 0.33285403746015885,
50
+ "time_s": 1769.0101560780022
51
  },
52
  {
53
  "hp": {
 
56
  "weight_decay": 0.01,
57
  "warmup_ratio": 0.1
58
  },
59
+ "val_loss": 0.36301534577911976,
60
+ "time_s": 1770.7788193659944
61
  }
62
  ]