tripathyShaswata's picture
Upload README.md with huggingface_hub
f3a3d75 verified
---
license: apache-2.0
language:
- en
tags:
- text-classification
- distilbert
- query-complexity
- agent-routing
- llm-routing
- ai-agents
- tool-use
pipeline_tag: text-classification
---
# QueryComplexityRouter
A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** β€” before you spend tokens on it.
Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
| Label | Meaning | Suggested Action |
|---|---|---|
| `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
| `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
| `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
## Why This Exists
Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
**QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** β€” before any LLM call is made.
Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:
```
User Message
β”‚
β–Ό
AgentIntentRouter ← What does the user want? (code, search, chat, ...)
β”‚
β–Ό
QueryComplexityRouter ← How hard is it? (no_llm / small_llm / large_llm)
β”‚
β–Ό
Route to the right tool/model
```
## Quick Start
```python
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
# Single prediction
result = router("What is 15% of 4500?")
print(result)
# [{'label': 'no_llm', 'score': 0.98}]
# Batch
messages = [
"What is the capital of France?", # no_llm
"Explain recursion in simple terms.", # small_llm
"Write a 1000-word blog post about AI.", # large_llm
"Design a distributed caching system.", # large_llm
"Fix this bug: def add(a,b): return a-b", # small_llm
]
results = router(messages)
for msg, res in zip(messages, results):
print(f" {res['label']:>12} ({res['score']:.2f}) β€” {msg}")
```
## 2-Stage Routing Pipeline
```python
from transformers import pipeline
intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
def route(user_message: str):
intent = intent_router(user_message)[0]
complexity = complexity_router(user_message)[0]
print(f"Intent: {intent['label']} ({intent['score']:.2f})")
print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
if complexity["label"] == "no_llm":
return handle_with_rules(user_message, intent["label"])
elif complexity["label"] == "small_llm":
return call_small_model(user_message)
else:
return call_large_model(user_message)
```
## Complexity Labels
### `no_llm` β€” No LLM needed
- Simple math: *"What is 42 * 7?"*
- Unit conversion: *"Convert 100km to miles"*
- Factual lookup: *"What is the capital of Japan?"*
- Date/time: *"What day is March 15 2026?"*
- Simple commands: *"Set a timer for 5 minutes"*
### `small_llm` β€” 1–3B model sufficient
- Short summarization: *"Summarize this paragraph..."*
- Basic explanation: *"Explain recursion to a 10-year-old"*
- Simple code: *"Write a Python function to reverse a string"*
- Short generation: *"Write a one-line bio for a software engineer"*
- Simple classification: *"Is this email spam?"*
### `large_llm` β€” 7B+ / frontier model required
- Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*
- Long-form writing: *"Write a 1000-word blog post about quantum computing"*
- Complex code: *"Build a REST API with auth, error handling, and tests"*
- Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*
- System design: *"Design a distributed caching system with eventual consistency"*
## Performance
- **Inference speed**: ~10ms on CPU, ~2ms on GPU
- **Model size**: ~260MB (DistilBERT-base)
### Evaluation Results
Results on held-out test set:
| Metric | Score |
|---|---|
| Accuracy | ~0.99 |
| F1 (weighted) | ~0.99 |
Per-class performance:
| Class | Precision | Recall | F1 |
|---|---|---|---|
| no_llm | ~1.00 | ~1.00 | ~1.00 |
| small_llm | ~0.98 | ~0.98 | ~0.98 |
| large_llm | ~0.99 | ~0.99 | ~0.99 |
> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
## Training Details
- **Base model**: distilbert-base-uncased
- **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
- **Epochs**: 5 (with early stopping, patience=2)
- **Learning rate**: 2e-5
- **Batch size**: 32
- **Max sequence length**: 128
## Use in Agent Pipelines
```python
COMPLEXITY_THRESHOLDS = {
"no_llm": 0.7,
"small_llm": 0.6,
"large_llm": 0.6,
}
def smart_route(message: str):
result = router(message)[0]
label, score = result["label"], result["score"]
if score < COMPLEXITY_THRESHOLDS[label]:
# Low confidence β€” default to large_llm for safety
label = "large_llm"
return label
```
## Limitations
- Trained on English text only
- Template-generated data may not cover all edge cases
- Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence β€” use threshold fallback
- Complexity is query-level only; does not account for context window length or domain expertise needed
## Related Models
- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) β€” companion intent classifier (8 categories, ~10ms on CPU)
## License
Apache 2.0 β€” use it however you want, commercial included.
## Citation
If this helps you, a star is appreciated!