darkolorin commited on
Commit
2d9cea5
·
verified ·
1 Parent(s): e407ec3

Upload ModernBERT router checkpoint (PID loss, utility=0.9762)

Browse files
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: answerdotai/ModernBERT-base
5
+ tags:
6
+ - router
7
+ - llm-routing
8
+ - modernbert
9
+ - text-classification
10
+ - on-device
11
+ pipeline_tag: text-classification
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - accuracy
16
+ language:
17
+ - en
18
+ ---
19
+
20
+ # Vibe Router — ModernBERT
21
+
22
+ A tiny LLM router that decides whether a chat request should run **locally** (on-device) or in the **cloud**, built on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
23
+
24
+ ## How it works
25
+
26
+ Given a user prompt, the model outputs a single logit. After sigmoid, values above the threshold (0.371) route to cloud; below routes to device.
27
+
28
+ - **Device model**: [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) (runs locally via MLX)
29
+ - **Cloud model**: GPT-5.2
30
+
31
+ ## Training
32
+
33
+ Fine-tuned end-to-end from `answerdotai/ModernBERT-base` using **Privileged Information Distillation (PID)** loss on 5,318 labeled prompt pairs with soft teacher labels derived from a GPT-4o judge.
34
+
35
+ | Hyperparameter | Value |
36
+ |----------------|-------|
37
+ | Learning rate | 2e-5 |
38
+ | β_kl | 0.05 |
39
+ | Weight decay | 0.01 |
40
+ | Warmup ratio | 0.1 |
41
+ | Epochs | 3 (early stopping) |
42
+ | Batch size | 32 |
43
+ | Hardware | NVIDIA H100 80GB |
44
+
45
+ ## Performance
46
+
47
+ | Metric | Value |
48
+ |--------|-------|
49
+ | Utility | 0.9762 |
50
+ | Cloud rate | 79.4% |
51
+ | Regret | 0.0064 |
52
+ | Catastrophic miss rate | 0.0% |
53
+ | ECE | 0.173 |
54
+ | Best threshold | 0.371 |
55
+
56
+ ### Baselines
57
+
58
+ | Model | Utility | Cloud% | Regret |
59
+ |-------|---------|--------|--------|
60
+ | Always device | 0.879 | 0% | 0.104 |
61
+ | Always cloud | 0.894 | 100% | 0.089 |
62
+ | **ModernBERT (PID)** | **0.976** | **79.4%** | **0.006** |
63
+
64
+ ## Latency
65
+
66
+ ~7ms per inference on GPU, ~10ms on CPU (Apple Silicon MPS).
67
+
68
+ ## Usage
69
+
70
+ ```python
71
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
72
+ import torch
73
+
74
+ model_id = "trymirai/vibe-router-modernbert"
75
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
76
+ model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=1)
77
+ model.eval()
78
+
79
+ prompt = "Write a Python B-tree implementation"
80
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
81
+
82
+ with torch.no_grad():
83
+ logits = model(**inputs).logits
84
+ p_cloud = torch.sigmoid(logits).item()
85
+
86
+ threshold = 0.371
87
+ decision = "cloud" if p_cloud > threshold else "device"
88
+ print(f"p(cloud)={p_cloud:.3f} → {decision}")
89
+ ```
90
+
91
+ ## Routing examples
92
+
93
+ | Prompt | p(cloud) | Decision |
94
+ |--------|----------|----------|
95
+ | hi | 0.011 | device |
96
+ | 2+2 | 0.009 | device |
97
+ | tell me a joke | 0.012 | device |
98
+ | hello | 0.011 | device |
99
+ | Write a Python B-tree with insert, delete, search | 0.911 | cloud |
100
+ | Implement a REST API with auth and rate limiting | 0.762 | cloud |
101
+ | Derive the volume of a sphere using integration | 0.900 | cloud |
102
+ | Who was the first host of Top Chef? | 0.946 | cloud |
103
+
104
+ ## License
105
+
106
+ Apache 2.0
107
+
108
+ ## Citation
109
+
110
+ ```bibtex
111
+ @misc{vibe-router-2026,
112
+ title={Vibe Router: On-Device LLM Routing with Privileged Information Distillation},
113
+ author={Mirai},
114
+ year={2026},
115
+ url={https://github.com/trymirai/vibe_router}
116
+ }
117
+ ```
config.json ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "LABEL_0"
24
+ },
25
+ "initializer_cutoff_factor": 2.0,
26
+ "initializer_range": 0.02,
27
+ "intermediate_size": 1152,
28
+ "label2id": {
29
+ "LABEL_0": 0
30
+ },
31
+ "layer_norm_eps": 1e-05,
32
+ "layer_types": [
33
+ "full_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "full_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention",
43
+ "sliding_attention",
44
+ "sliding_attention",
45
+ "full_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "full_attention",
49
+ "sliding_attention",
50
+ "sliding_attention",
51
+ "full_attention",
52
+ "sliding_attention",
53
+ "sliding_attention",
54
+ "full_attention"
55
+ ],
56
+ "local_attention": 128,
57
+ "max_position_embeddings": 8192,
58
+ "mlp_bias": false,
59
+ "mlp_dropout": 0.0,
60
+ "model_type": "modernbert",
61
+ "norm_bias": false,
62
+ "norm_eps": 1e-05,
63
+ "num_attention_heads": 12,
64
+ "num_hidden_layers": 22,
65
+ "pad_token_id": 50283,
66
+ "position_embedding_type": "absolute",
67
+ "rope_parameters": {
68
+ "full_attention": {
69
+ "rope_theta": 160000.0,
70
+ "rope_type": "default"
71
+ },
72
+ "sliding_attention": {
73
+ "rope_theta": 10000.0,
74
+ "rope_type": "default"
75
+ }
76
+ },
77
+ "sep_token_id": 50282,
78
+ "sparse_pred_ignore_index": -100,
79
+ "sparse_prediction": false,
80
+ "tie_word_embeddings": true,
81
+ "transformers_version": "5.2.0",
82
+ "vocab_size": 50368
83
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb128103dab9e2938e447b4079d4f0bb3034e2f26cfd2668159f37aeaa54f67f
3
+ size 598436708
router_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "answerdotai/ModernBERT-base",
3
+ "best_threshold": 0.37105263157894736,
4
+ "loss": "PID",
5
+ "hp": {
6
+ "lr": 2e-05,
7
+ "beta_kl": 0.05,
8
+ "weight_decay": 0.01,
9
+ "warmup_ratio": 0.1
10
+ },
11
+ "device_model": "LiquidAI/LFM2.5-1.2B-Instruct",
12
+ "cloud_model": "gpt-5.2",
13
+ "test_results": {
14
+ "utility": 0.9762406349182129,
15
+ "cloud_rate": 0.7944862155388471,
16
+ "regret": 0.006434837356209755,
17
+ "cat_miss": 0.0
18
+ }
19
+ }
sweep_results.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "hp": {
4
+ "lr": 1e-05,
5
+ "beta_kl": 0.05,
6
+ "weight_decay": 0.01,
7
+ "warmup_ratio": 0.1
8
+ },
9
+ "val_loss": 0.05074503788000751,
10
+ "time_s": 95.0573191291187
11
+ },
12
+ {
13
+ "hp": {
14
+ "lr": 1e-05,
15
+ "beta_kl": 0.1,
16
+ "weight_decay": 0.01,
17
+ "warmup_ratio": 0.1
18
+ },
19
+ "val_loss": 0.0569811669310373,
20
+ "time_s": 107.19165365281515
21
+ },
22
+ {
23
+ "hp": {
24
+ "lr": 2e-05,
25
+ "beta_kl": 0.05,
26
+ "weight_decay": 0.01,
27
+ "warmup_ratio": 0.1
28
+ },
29
+ "val_loss": 0.04958628546137836,
30
+ "time_s": 106.77600225992501
31
+ },
32
+ {
33
+ "hp": {
34
+ "lr": 2e-05,
35
+ "beta_kl": 0.1,
36
+ "weight_decay": 0.01,
37
+ "warmup_ratio": 0.1
38
+ },
39
+ "val_loss": 0.05651537539578055,
40
+ "time_s": 145.05425760895014
41
+ },
42
+ {
43
+ "hp": {
44
+ "lr": 5e-05,
45
+ "beta_kl": 0.05,
46
+ "weight_decay": 0.01,
47
+ "warmup_ratio": 0.1
48
+ },
49
+ "val_loss": 0.04995061208804449,
50
+ "time_s": 89.70805354882032
51
+ },
52
+ {
53
+ "hp": {
54
+ "lr": 5e-05,
55
+ "beta_kl": 0.1,
56
+ "weight_decay": 0.01,
57
+ "warmup_ratio": 0.1
58
+ },
59
+ "val_loss": 0.05411159153170129,
60
+ "time_s": 125.99459161888808
61
+ }
62
+ ]
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_input_names": [
8
+ "input_ids",
9
+ "attention_mask"
10
+ ],
11
+ "model_max_length": 8192,
12
+ "pad_token": "[PAD]",
13
+ "sep_token": "[SEP]",
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "[UNK]"
16
+ }