Text Classification
Scikit-learn
Joblib
English
llm-routing
model-selection
budget-optimization
nearest-neighbor
Instructions to use JiaqiXue/R2-Router-RouterArena with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use JiaqiXue/R2-Router-RouterArena with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("JiaqiXue/R2-Router-RouterArena", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - llm-routing | |
| - model-selection | |
| - budget-optimization | |
| - knn | |
| language: | |
| - en | |
| library_name: sklearn | |
| pipeline_tag: text-classification | |
| # R2-Router: LLM Router with Joint Model-Budget Optimization | |
| **R2-Router** intelligently routes each query to the optimal (LLM, token budget) pair, jointly optimizing accuracy and inference cost. Ranked **#1** on the [RouterArena](https://routerarena.github.io/) leaderboard. | |
| **Paper**: [R2-Router (arxiv)](https://arxiv.org/abs/TODO) | |
| ## RouterArena Performance | |
| Official leaderboard results on 8,400 queries: | |
| | Metric | Value | | |
| |--------|-------| | |
| | Accuracy | 71.23% | | |
| | Cost per 1K Queries | $0.061 | | |
| | Arena Score (beta=0.1) | **71.60** | | |
| | Robustness Score | 45.71% | | |
| | Rank | **#1** | | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install scikit-learn numpy joblib huggingface_hub sentence-transformers | |
| ``` | |
| ### Complete Example | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| from sentence_transformers import SentenceTransformer | |
| import sys | |
| # 1. Download router | |
| path = snapshot_download("JiaqiXue/r2-router") | |
| sys.path.insert(0, path) | |
| from router import R2Router | |
| # 2. Load pre-trained KNN checkpoints | |
| router = R2Router.from_pretrained(path) | |
| # 3. Embed your query with Qwen3-0.6B (1024-dim) | |
| embedder = SentenceTransformer("Qwen/Qwen3-0.6B") | |
| embedding = embedder.encode("What is the capital of France?") | |
| # 4. Route! | |
| result = router.route(embedding) | |
| print(f"Model: {result['model_full_name']}") | |
| print(f"Token Budget: {result['token_limit']}") | |
| print(f"Predicted Quality: {result['predicted_quality']:.3f}") | |
| ``` | |
| ### Train from Scratch | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| import sys | |
| path = snapshot_download("JiaqiXue/r2-router") | |
| sys.path.insert(0, path) | |
| from router import R2Router | |
| # Train KNN from the provided sub_10 training data (custom hyperparameters) | |
| router = R2Router.from_training_data(path, k=80) | |
| ``` | |
| ### Alternative: vLLM Embeddings (Faster for Batches) | |
| ```python | |
| from vllm import LLM | |
| llm = LLM(model="Qwen/Qwen3-0.6B", runner="pooling") | |
| outputs = llm.embed(["What is the capital of France?"]) | |
| embedding = outputs[0].outputs.embedding | |
| ``` | |
| Or with vLLM for faster batch inference: | |
| ```python | |
| from vllm import LLM | |
| llm = LLM(model="Qwen/Qwen3-0.6B", runner="pooling") | |
| outputs = llm.embed(["What is the capital of France?"]) | |
| embedding = outputs[0].outputs.embedding | |
| ``` | |
| ## Architecture | |
| R2-Router jointly optimizes **which model** to use and **how many tokens** to allocate per query. | |
| ### Routing Formula | |
| ``` | |
| risk(M, b) = (1 - lambda) * predicted_quality(query, M, b) - lambda * predicted_tokens(query, M) * price_M / 1e6 | |
| (M*, b*) = argmax risk | |
| ``` | |
| ### Pipeline | |
| ``` | |
| Input Query | |
| | | |
| [1] Embed with Qwen3-0.6B -> 1024-dim vector | |
| | | |
| [2] For each (model, budget) pair: | |
| - KNN predicts quality (accuracy) | |
| - KNN predicts output token count | |
| - Compute risk = (1-lambda) * quality - lambda * cost | |
| | | |
| [3] Select (model, budget) with highest risk | |
| | | |
| Output: (model_name, token_budget) | |
| ``` | |
| ### Model Pool (6 LLMs) | |
| | Model | Output $/M tokens | | |
| |-------|------------------| | |
| | Qwen3-235B-A22B | $0.463 | | |
| | Qwen3-Next-80B-A3B | $1.10 | | |
| | Qwen3-30B-A3B | $0.33 | | |
| | Qwen3-Coder-Next | $0.30 | | |
| | Gemini 2.5 Flash | $2.50 | | |
| | Claude 3 Haiku | $1.25 | | |
| ### Token Budgets | |
| 4 output token limits: **100, 200, 400, 800** tokens. | |
| ### Key Parameters | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | KNN K | 80 | | |
| | Lambda | 0.999 | | |
| | Distance Metric | Cosine | | |
| | KNN Weights | Distance-weighted | | |
| | Embedding Dim | 1024 | | |
| ## Repository Contents | |
| ``` | |
| config.json # Router configuration (models, budgets, prices, hyperparams) | |
| router.py # Self-contained inference code | |
| training_data/ | |
| embeddings.npy # Sub_10 training embeddings (809 x 1024) | |
| labels.json # Per-(model, budget) accuracy & token labels | |
| checkpoints/ | |
| quality_knn_*.joblib # Pre-fitted KNN quality predictors (18 total) | |
| token_knn_*.joblib # Pre-fitted KNN token predictors (6 total) | |
| ``` | |
| ### Two Ways to Use | |
| 1. **Load checkpoints** (`from_pretrained`): Directly load pre-fitted KNN models. No training needed. | |
| 2. **Train from data** (`from_training_data`): Use the provided training embeddings and labels to fit your own KNN with custom hyperparameters (e.g., different K, distance metric). | |
| ## Training Details | |
| - **Training Data**: RouterArena sub_10 split (809 queries, 10% of full 8,400) | |
| - **Method**: KNeighborsRegressor with cosine distance, distance-weighted | |
| - **Evaluation**: Full 8,400 RouterArena queries (no data leakage) | |
| - **Training Time**: < 1 second (KNN fitting) | |
| ## Citation | |
| ```bibtex | |
| @article{r2router2026, | |
| title={R2-Router: A New Paradigm for LLM Routing with Reasoning}, | |
| author={TODO}, | |
| year={2026}, | |
| url={https://arxiv.org/abs/TODO} | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0 | |