File size: 6,197 Bytes
f3a3d75 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | ---
license: apache-2.0
language:
- en
tags:
- text-classification
- distilbert
- query-complexity
- agent-routing
- llm-routing
- ai-agents
- tool-use
pipeline_tag: text-classification
---
# QueryComplexityRouter
A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** β before you spend tokens on it.
Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
| Label | Meaning | Suggested Action |
|---|---|---|
| `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
| `small_llm` | A 1β3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
| `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
## Why This Exists
Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
**QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** β before any LLM call is made.
Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:
```
User Message
β
βΌ
AgentIntentRouter β What does the user want? (code, search, chat, ...)
β
βΌ
QueryComplexityRouter β How hard is it? (no_llm / small_llm / large_llm)
β
βΌ
Route to the right tool/model
```
## Quick Start
```python
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
# Single prediction
result = router("What is 15% of 4500?")
print(result)
# [{'label': 'no_llm', 'score': 0.98}]
# Batch
messages = [
"What is the capital of France?", # no_llm
"Explain recursion in simple terms.", # small_llm
"Write a 1000-word blog post about AI.", # large_llm
"Design a distributed caching system.", # large_llm
"Fix this bug: def add(a,b): return a-b", # small_llm
]
results = router(messages)
for msg, res in zip(messages, results):
print(f" {res['label']:>12} ({res['score']:.2f}) β {msg}")
```
## 2-Stage Routing Pipeline
```python
from transformers import pipeline
intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
def route(user_message: str):
intent = intent_router(user_message)[0]
complexity = complexity_router(user_message)[0]
print(f"Intent: {intent['label']} ({intent['score']:.2f})")
print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
if complexity["label"] == "no_llm":
return handle_with_rules(user_message, intent["label"])
elif complexity["label"] == "small_llm":
return call_small_model(user_message)
else:
return call_large_model(user_message)
```
## Complexity Labels
### `no_llm` β No LLM needed
- Simple math: *"What is 42 * 7?"*
- Unit conversion: *"Convert 100km to miles"*
- Factual lookup: *"What is the capital of Japan?"*
- Date/time: *"What day is March 15 2026?"*
- Simple commands: *"Set a timer for 5 minutes"*
### `small_llm` β 1β3B model sufficient
- Short summarization: *"Summarize this paragraph..."*
- Basic explanation: *"Explain recursion to a 10-year-old"*
- Simple code: *"Write a Python function to reverse a string"*
- Short generation: *"Write a one-line bio for a software engineer"*
- Simple classification: *"Is this email spam?"*
### `large_llm` β 7B+ / frontier model required
- Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*
- Long-form writing: *"Write a 1000-word blog post about quantum computing"*
- Complex code: *"Build a REST API with auth, error handling, and tests"*
- Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*
- System design: *"Design a distributed caching system with eventual consistency"*
## Performance
- **Inference speed**: ~10ms on CPU, ~2ms on GPU
- **Model size**: ~260MB (DistilBERT-base)
### Evaluation Results
Results on held-out test set:
| Metric | Score |
|---|---|
| Accuracy | ~0.99 |
| F1 (weighted) | ~0.99 |
Per-class performance:
| Class | Precision | Recall | F1 |
|---|---|---|---|
| no_llm | ~1.00 | ~1.00 | ~1.00 |
| small_llm | ~0.98 | ~0.98 | ~0.98 |
| large_llm | ~0.99 | ~0.99 | ~0.99 |
> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
## Training Details
- **Base model**: distilbert-base-uncased
- **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
- **Epochs**: 5 (with early stopping, patience=2)
- **Learning rate**: 2e-5
- **Batch size**: 32
- **Max sequence length**: 128
## Use in Agent Pipelines
```python
COMPLEXITY_THRESHOLDS = {
"no_llm": 0.7,
"small_llm": 0.6,
"large_llm": 0.6,
}
def smart_route(message: str):
result = router(message)[0]
label, score = result["label"], result["score"]
if score < COMPLEXITY_THRESHOLDS[label]:
# Low confidence β default to large_llm for safety
label = "large_llm"
return label
```
## Limitations
- Trained on English text only
- Template-generated data may not cover all edge cases
- Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence β use threshold fallback
- Complexity is query-level only; does not account for context window length or domain expertise needed
## Related Models
- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) β companion intent classifier (8 categories, ~10ms on CPU)
## License
Apache 2.0 β use it however you want, commercial included.
## Citation
If this helps you, a star is appreciated!
|