File size: 5,499 Bytes
16c3835
18d5728
 
c1cc6a2
18d5728
 
 
 
16c3835
c1cc6a2
 
 
 
 
 
16c3835
 
c1cc6a2
16c3835
18d5728
 
 
 
 
 
 
 
16c3835
358fa63
16c3835
358fa63
18d5728
 
 
 
0e55651
358fa63
c1cc6a2
358fa63
18d5728
 
c1cc6a2
358fa63
c1cc6a2
358fa63
16c3835
 
 
 
 
c1cc6a2
 
 
358fa63
c1cc6a2
16c3835
358fa63
c1cc6a2
 
16c3835
358fa63
 
16c3835
 
358fa63
16c3835
 
358fa63
c1cc6a2
 
 
 
 
 
 
 
358fa63
c1cc6a2
 
16c3835
358fa63
c1cc6a2
358fa63
18d5728
 
 
c1cc6a2
358fa63
c1cc6a2
358fa63
 
 
 
 
 
16c3835
358fa63
c1cc6a2
358fa63
18d5728
 
 
 
c1cc6a2
358fa63
16c3835
358fa63
18d5728
 
 
 
0e55651
358fa63
0e55651
358fa63
16c3835
358fa63
c1cc6a2
358fa63
0e55651
358fa63
 
 
 
 
 
 
8ea3ca1
358fa63
0e55651
358fa63
18d5728
 
 
 
c1cc6a2
358fa63
c1cc6a2
358fa63
c1cc6a2
358fa63
c1cc6a2
358fa63
 
c1cc6a2
358fa63
 
 
c1cc6a2
358fa63
 
16c3835
 
 
18d5728
 
 
 
16c3835
 
 
 
ec5143a
 
 
16c3835
ec5143a
 
16c3835
18d5728
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
language:
- en
library_name: adaptive-classifier
license: apache-2.0
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- llm
- routing
- multi-model
- bert
- router-arena
- model-selection
---

# Chayan: Multi-Model LLM Router

This model is a high-performance LLM router presented in the paper [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202).

-   πŸ“š Paper (Hugging Face): [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202)
-   πŸ“š Paper (arXiv): https://arxiv.org/abs/2510.00202
-   πŸ’» Library Code: https://github.com/codelion/adaptive-classifier
-   🌐 RouterArena Project Page: https://routeworks.github.io/

**Chayan** intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff.

## πŸ† RouterArena Performance

**Official Leaderboard Results** (8,400 queries):
-   πŸ₯‡ **#1 Optimal Accuracy Score: 88.7%** - SOTA! (Best routing decision quality)
-   πŸ₯ˆ **#2 Optimal Selection Score: 43.0%** - Silver! (Second-best model selection)
-   **#7 Overall** (#5 open-source): 64.9% accuracy, 63.8 arena score
-   **$0.60 per 1K queries** - Cost-efficient routing

![RouterArena Leaderboard](routerarena_leaderboard.png)

**What do these metrics mean?**
-   **Optimal Accuracy**: When Chayan routes to a model, that model gives the correct answer 88.7% of the time
-   **Optimal Selection**: Chayan selects the best available model 43% of the time

View full leaderboard: [RouterArena](https://routeworks.github.io/) | [PR #24](https://github.com/RouteWorks/RouterArena/pull/24)

## Quick Start

```bash
pip install adaptive-classifier
```

```python
from adaptive_classifier import AdaptiveClassifier

# Load router
router = AdaptiveClassifier.load("adaptive-classifier/chayan")

# Get routing decision
query = "What is the capital of France?"
predictions = router.predict(query, k=4)

# Route to top model
selected_model = predictions[0][0]  # e.g., "openai/gpt-4o-mini"
```

### Recommended: Use with Calibration

```python
# Apply calibration factors for best performance
calibration = {
    "openai/gpt-4o-mini": 0.9,
    "google/gemini-2.5-flash-lite": 1.5,
    "google/gemini-2.5-flash": 1.8,
    "openai/gpt-4o": 1.5
}

predictions = router.predict(query, k=4)
calibrated_scores = {model: score * calibration[model] for model, score in predictions}
selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0]
```

## Architecture

**Core Components:**
-   **Base Model**: BERT-base-uncased embeddings
-   **Classifier**: Adaptive K-NN with prototype memory (FAISS-backed)
-   **Innovation**: Calibrated confidence scores to correct training data imbalance

**Supported Models:**

| Model | Use Case | Cost/1M tokens |
|-------|----------|----------------|
| openai/gpt-4o-mini | Simple queries | $0.15 |
| google/gemini-2.5-flash-lite | Medium complexity | $0.075 |
| google/gemini-2.5-flash | Higher complexity | $0.30 |
| openai/gpt-4o | Complex queries | $2.50 |

## How It Works

### Training
-   **Dataset**: RouterArena sub_10 (809 queries)
-   **Oracle Labels**: 4-model cascade strategy (select cheapest successful model)
-   **Training Time**: 19.2 minutes
-   **Method**: K-NN classifier with 3000 prototypes, temperature 0.4

### The Calibration Breakthrough

The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance:
-   57% gpt-4o-mini examples
-   27% gpt-4o examples
-   12% gemini-flash-lite examples
-   4% gemini-flash examples

**Solution**: Apply post-training calibration factors to correct the bias without retraining.

**Result**: +7.29pp improvement (61.76% β†’ 69.05% on sub_10 benchmark)

## Performance Benchmarks

**Sub_10 Benchmark (809 queries):**

| Router | Accuracy | Cost/1K |
|--------|----------|---------|
| All gpt-4o-mini (baseline) | 56.98% | $0.088 |
| 2-model router | 61.43% | $0.217 |
| Chayan (uncalibrated) | 61.76% | $0.269 |
| **Chayan (calibrated)** | **69.05%** | **$0.333** |
| Perfect 2-model oracle | 69.84% | $0.784 |

**Key Insight**: Chayan achieves 99% of perfect oracle performance at 57% lower cost.

**Full Dataset (8,400 queries):**
-   **Optimal Accuracy**: 88.7% (πŸ₯‡ #1)
-   **Optimal Selection**: 43.0% (πŸ₯ˆ #2)
-   **Overall Accuracy**: 64.9% (#7 overall, #5 open-source)
-   **Cost**: $0.60/1K queries

## Advanced Usage

### Feature Augmentation

Chayan was trained with query features prepended as tokens:

```python
from adaptive_classifier.complexity_features import augment_query_with_features

query = "What is 2+2?"
augmented = augment_query_with_features(query)
# Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?"

predictions = router.predict(augmented, k=4)
```

## Limitations

-   Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains
-   Requires the 4 specific models to be available via API
-   Performance depends on query distribution similar to RouterArena benchmark
-   Cost estimates assume ~500 tokens per query

## Citation

```bibtex
@software{adaptive_classifier,
  title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
  author = {Sharma, Asankhaya},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/codelion/adaptive-classifier}
}
```