File size: 5,499 Bytes
16c3835 d0f3f95 c1cc6a2 d0f3f95 16c3835 c1cc6a2 16c3835 c1cc6a2 16c3835 d0f3f95 16c3835 358fa63 16c3835 358fa63 d0f3f95 0e55651 358fa63 c1cc6a2 358fa63 d0f3f95 c1cc6a2 358fa63 c1cc6a2 358fa63 16c3835 c1cc6a2 358fa63 c1cc6a2 16c3835 358fa63 c1cc6a2 16c3835 358fa63 16c3835 358fa63 16c3835 358fa63 c1cc6a2 358fa63 c1cc6a2 16c3835 358fa63 c1cc6a2 358fa63 d0f3f95 c1cc6a2 358fa63 c1cc6a2 358fa63 16c3835 358fa63 c1cc6a2 358fa63 d0f3f95 c1cc6a2 358fa63 16c3835 358fa63 d0f3f95 0e55651 358fa63 0e55651 358fa63 16c3835 358fa63 c1cc6a2 358fa63 0e55651 358fa63 8ea3ca1 358fa63 0e55651 358fa63 d0f3f95 c1cc6a2 358fa63 c1cc6a2 358fa63 c1cc6a2 358fa63 c1cc6a2 358fa63 c1cc6a2 358fa63 c1cc6a2 358fa63 16c3835 d0f3f95 16c3835 ec5143a 16c3835 ec5143a 16c3835 d0f3f95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
language:
- en
library_name: adaptive-classifier
license: apache-2.0
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- llm
- routing
- multi-model
- bert
- router-arena
- model-selection
---
# Chayan: Multi-Model LLM Router
This model is a high-performance LLM router presented in the paper [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202).
- π Paper (Hugging Face): [RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers](https://huggingface.co/papers/2510.00202)
- π Paper (arXiv): https://arxiv.org/abs/2510.00202
- π» Library Code: https://github.com/codelion/adaptive-classifier
- π RouterArena Project Page: https://routeworks.github.io/
**Chayan** intelligently routes between 4 models (gpt-4o-mini, gemini-2.5-flash-lite, gemini-2.5-flash, and gpt-4o) to optimize the accuracy-cost tradeoff.
## π RouterArena Performance
**Official Leaderboard Results** (8,400 queries):
- π₯ **#1 Optimal Accuracy Score: 88.7%** - SOTA! (Best routing decision quality)
- π₯ **#2 Optimal Selection Score: 43.0%** - Silver! (Second-best model selection)
- **#7 Overall** (#5 open-source): 64.9% accuracy, 63.8 arena score
- **$0.60 per 1K queries** - Cost-efficient routing

**What do these metrics mean?**
- **Optimal Accuracy**: When Chayan routes to a model, that model gives the correct answer 88.7% of the time
- **Optimal Selection**: Chayan selects the best available model 43% of the time
View full leaderboard: [RouterArena](https://routeworks.github.io/) | [PR #24](https://github.com/RouteWorks/RouterArena/pull/24)
## Quick Start
```bash
pip install adaptive-classifier
```
```python
from adaptive_classifier import AdaptiveClassifier
# Load router
router = AdaptiveClassifier.load("adaptive-classifier/chayan")
# Get routing decision
query = "What is the capital of France?"
predictions = router.predict(query, k=4)
# Route to top model
selected_model = predictions[0][0] # e.g., "openai/gpt-4o-mini"
```
### Recommended: Use with Calibration
```python
# Apply calibration factors for best performance
calibration = {
"openai/gpt-4o-mini": 0.9,
"google/gemini-2.5-flash-lite": 1.5,
"google/gemini-2.5-flash": 1.8,
"openai/gpt-4o": 1.5
}
predictions = router.predict(query, k=4)
calibrated_scores = {model: score * calibration[model] for model, score in predictions}
selected_model = max(calibrated_scores.items(), key=lambda x: x[1])[0]
```
## Architecture
**Core Components:**
- **Base Model**: BERT-base-uncased embeddings
- **Classifier**: Adaptive K-NN with prototype memory (FAISS-backed)
- **Innovation**: Calibrated confidence scores to correct training data imbalance
**Supported Models:**
| Model | Use Case | Cost/1M tokens |
|-------|----------|----------------|
| openai/gpt-4o-mini | Simple queries | $0.15 |
| google/gemini-2.5-flash-lite | Medium complexity | $0.075 |
| google/gemini-2.5-flash | Higher complexity | $0.30 |
| openai/gpt-4o | Complex queries | $2.50 |
## How It Works
### Training
- **Dataset**: RouterArena sub_10 (809 queries)
- **Oracle Labels**: 4-model cascade strategy (select cheapest successful model)
- **Training Time**: 19.2 minutes
- **Method**: K-NN classifier with 3000 prototypes, temperature 0.4
### The Calibration Breakthrough
The uncalibrated router achieved 61.76% accuracy but was biased toward gpt-4o-mini (83% routing). This happened because the training data had class imbalance:
- 57% gpt-4o-mini examples
- 27% gpt-4o examples
- 12% gemini-flash-lite examples
- 4% gemini-flash examples
**Solution**: Apply post-training calibration factors to correct the bias without retraining.
**Result**: +7.29pp improvement (61.76% β 69.05% on sub_10 benchmark)
## Performance Benchmarks
**Sub_10 Benchmark (809 queries):**
| Router | Accuracy | Cost/1K |
|--------|----------|---------|
| All gpt-4o-mini (baseline) | 56.98% | $0.088 |
| 2-model router | 61.43% | $0.217 |
| Chayan (uncalibrated) | 61.76% | $0.269 |
| **Chayan (calibrated)** | **69.05%** | **$0.333** |
| Perfect 2-model oracle | 69.84% | $0.784 |
**Key Insight**: Chayan achieves 99% of perfect oracle performance at 57% lower cost.
**Full Dataset (8,400 queries):**
- **Optimal Accuracy**: 88.7% (π₯ #1)
- **Optimal Selection**: 43.0% (π₯ #2)
- **Overall Accuracy**: 64.9% (#7 overall, #5 open-source)
- **Cost**: $0.60/1K queries
## Advanced Usage
### Feature Augmentation
Chayan was trained with query features prepended as tokens:
```python
from adaptive_classifier.complexity_features import augment_query_with_features
query = "What is 2+2?"
augmented = augment_query_with_features(query)
# Returns: "[LEN:12][WORDS:3][MATH:1][SENT:1][MC:0] What is 2+2?"
predictions = router.predict(augmented, k=4)
```
## Limitations
- Calibration factors optimized on RouterArena sub_10; may require adjustment for other domains
- Requires the 4 specific models to be available via API
- Performance depends on query distribution similar to RouterArena benchmark
- Cost estimates assume ~500 tokens per query
## Citation
```bibtex
@software{adaptive_classifier,
title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
author = {Sharma, Asankhaya},
year = {2025},
publisher = {GitHub},
url = {https://github.com/codelion/adaptive-classifier}
}
``` |