IRouterLM
Collection
12 items
•
Updated
GitHub • Training Data
A lightweight query-aware router that dynamically selects the optimal retrieval modality and architecture per query. IRouterLM achieves state-of-the-art accuracy (0.76 nDCG@5) while reducing latency by 90% compared to the strongest baseline.
IRouterLM is a fine-tuned Qwen3-0.6B model that classifies queries into optimal RAG retrieval strategies. Given a user query, the model predicts which retrieval pipeline will yield the best results while balancing accuracy and latency.
| Strategy ID | Strategy Name | Description |
|---|---|---|
| 0 | MULTIMODAL_RERANK |
Multimodal dense retrieval + late-interaction reranking |
| 1 | MULTIMODAL-SINGLE |
Single-stage multimodal dense retrieval |
| 2 | TEXT_RERANK |
Text dense retrieval + late-interaction reranking |
| 3 | TEXT-SINGLE |
Single-stage text dense retrieval |
from transformers import AutoModel, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModel.from_pretrained("ananoymous/IRouterLM", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ananoymous/IRouterLM")
# Example query
query = "What was the revenue growth in Q3 2024?"
inputs = tokenizer(query, return_tensors="pt")
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs["logits"], dim=-1)
prediction = probs.argmax(dim=-1).item()
# Strategy mapping
strategies = ["MULTIMODAL_RERANK", "MULTIMODAL-SINGLE", "TEXT_RERANK", "TEXT-SINGLE"]
print(f"Predicted strategy: {strategies[prediction]}")
print(f"Confidence: {probs[0][prediction]:.2%}")
predict Method
result = model.predict(inputs["input_ids"], inputs["attention_mask"])
print(f"Strategy: {result['strategy_names'][0]}")
print(f"Probabilities: {result['probabilities']}")
Query → Qwen3-0.6B (LoRA) → Mean Pooling → Classifier → Strategy Prediction
The model was trained on 80,000+ queries from 11 benchmarks:
| Domain | Datasets |
|---|---|
| Financial | FinReport, FinSlides, FinQA, ConvFinQA, TAT-DQA |
| Scientific | ArxivQA, SciQAG |
| General | Wiki-SS, MP-DocVQA, DUDE, VQAnBD, |
| Parameter | Value |
|---|---|
| Learning Rate | 1e-4 |
| Batch Size | 16 |
| Epochs | 2 |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.1 |
| Scheduler | Cosine |
| Precision | bfloat16 |
| λ (trade-off) | 0.0 (accuracy-focused) |
IRouterLM is designed for:
MIT License
This work builds on: