license: apache-2.0
language:
- en
tags:
- text-classification
- distilbert
- query-complexity
- agent-routing
- llm-routing
- ai-agents
- tool-use
pipeline_tag: text-classification
QueryComplexityRouter
A fast, lightweight 3-class classifier that decides how much LLM power a query needs β before you spend tokens on it.
Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
| Label | Meaning | Suggested Action |
|---|---|---|
no_llm |
Answerable with rules, lookup, or regex | Skip the LLM entirely |
small_llm |
A 1β3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
large_llm |
Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
Why This Exists
Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
QueryComplexityRouter sits at the top of your pipeline and makes this decision in ~10ms on CPU β before any LLM call is made.
Pair it with AgentIntentRouter for a full 2-stage routing pipeline:
User Message
β
βΌ
AgentIntentRouter β What does the user want? (code, search, chat, ...)
β
βΌ
QueryComplexityRouter β How hard is it? (no_llm / small_llm / large_llm)
β
βΌ
Route to the right tool/model
Quick Start
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
# Single prediction
result = router("What is 15% of 4500?")
print(result)
# [{'label': 'no_llm', 'score': 0.98}]
# Batch
messages = [
"What is the capital of France?", # no_llm
"Explain recursion in simple terms.", # small_llm
"Write a 1000-word blog post about AI.", # large_llm
"Design a distributed caching system.", # large_llm
"Fix this bug: def add(a,b): return a-b", # small_llm
]
results = router(messages)
for msg, res in zip(messages, results):
print(f" {res['label']:>12} ({res['score']:.2f}) β {msg}")
2-Stage Routing Pipeline
from transformers import pipeline
intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
def route(user_message: str):
intent = intent_router(user_message)[0]
complexity = complexity_router(user_message)[0]
print(f"Intent: {intent['label']} ({intent['score']:.2f})")
print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
if complexity["label"] == "no_llm":
return handle_with_rules(user_message, intent["label"])
elif complexity["label"] == "small_llm":
return call_small_model(user_message)
else:
return call_large_model(user_message)
Complexity Labels
no_llm β No LLM needed
- Simple math: "What is 42 * 7?"
- Unit conversion: "Convert 100km to miles"
- Factual lookup: "What is the capital of Japan?"
- Date/time: "What day is March 15 2026?"
- Simple commands: "Set a timer for 5 minutes"
small_llm β 1β3B model sufficient
- Short summarization: "Summarize this paragraph..."
- Basic explanation: "Explain recursion to a 10-year-old"
- Simple code: "Write a Python function to reverse a string"
- Short generation: "Write a one-line bio for a software engineer"
- Simple classification: "Is this email spam?"
large_llm β 7B+ / frontier model required
- Deep reasoning: "Analyze the ethical implications of AI replacing jobs"
- Long-form writing: "Write a 1000-word blog post about quantum computing"
- Complex code: "Build a REST API with auth, error handling, and tests"
- Multi-doc synthesis: "Given these 5 documents, synthesize an answer..."
- System design: "Design a distributed caching system with eventual consistency"
Performance
- Inference speed: ~10ms on CPU, ~2ms on GPU
- Model size: ~260MB (DistilBERT-base)
Evaluation Results
Results on held-out test set:
| Metric | Score |
|---|---|
| Accuracy | ~0.99 |
| F1 (weighted) | ~0.99 |
Per-class performance:
| Class | Precision | Recall | F1 |
|---|---|---|---|
| no_llm | ~1.00 | ~1.00 | ~1.00 |
| small_llm | ~0.98 | ~0.98 | ~0.98 |
| large_llm | ~0.99 | ~0.99 | ~0.99 |
Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
Training Details
- Base model: distilbert-base-uncased
- Training data:
1,400 synthetic examples per class (4,200 total), template-generated with natural language variation - Epochs: 5 (with early stopping, patience=2)
- Learning rate: 2e-5
- Batch size: 32
- Max sequence length: 128
Use in Agent Pipelines
COMPLEXITY_THRESHOLDS = {
"no_llm": 0.7,
"small_llm": 0.6,
"large_llm": 0.6,
}
def smart_route(message: str):
result = router(message)[0]
label, score = result["label"], result["score"]
if score < COMPLEXITY_THRESHOLDS[label]:
# Low confidence β default to large_llm for safety
label = "large_llm"
return label
Limitations
- Trained on English text only
- Template-generated data may not cover all edge cases
- Borderline queries (e.g., "explain quantum entanglement") may get lower confidence β use threshold fallback
- Complexity is query-level only; does not account for context window length or domain expertise needed
Related Models
- tripathyShaswata/AgentIntentRouter β companion intent classifier (8 categories, ~10ms on CPU)
License
Apache 2.0 β use it however you want, commercial included.
Citation
If this helps you, a star is appreciated!