tripathyShaswata
/

QueryComplexityRouter

+---
+license: apache-2.0
+language:
+- en
+tags:
+- text-classification
+- distilbert
+- query-complexity
+- agent-routing
+- llm-routing
+- ai-agents
+- tool-use
+pipeline_tag: text-classification
+---
+# QueryComplexityRouter
+A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** — before you spend tokens on it.
+Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
+| Label | Meaning | Suggested Action |
+|---|---|---|
+| `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
+| `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
+| `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
+## Why This Exists
+Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
+**QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** — before any LLM call is made.
+Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:
+```
+User Message
+    │
+    ▼
+AgentIntentRouter          ← What does the user want? (code, search, chat, ...)
+    │
+    ▼
+QueryComplexityRouter      ← How hard is it? (no_llm / small_llm / large_llm)
+    │
+    ▼
+Route to the right tool/model
+```
+## Quick Start
+```python
+from transformers import pipeline
+router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
+# Single prediction
+result = router("What is 15% of 4500?")
+print(result)
+# [{'label': 'no_llm', 'score': 0.98}]
+# Batch
+messages = [
+    "What is the capital of France?",           # no_llm
+    "Explain recursion in simple terms.",        # small_llm
+    "Write a 1000-word blog post about AI.",     # large_llm
+    "Design a distributed caching system.",      # large_llm
+    "Fix this bug: def add(a,b): return a-b",   # small_llm
+]
+results = router(messages)
+for msg, res in zip(messages, results):
+    print(f"  {res['label']:>12} ({res['score']:.2f}) — {msg}")
+```
+## 2-Stage Routing Pipeline
+```python
+from transformers import pipeline
+intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
+complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
+def route(user_message: str):
+    intent = intent_router(user_message)[0]
+    complexity = complexity_router(user_message)[0]
+    print(f"Intent:     {intent['label']} ({intent['score']:.2f})")
+    print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
+    if complexity["label"] == "no_llm":
+        return handle_with_rules(user_message, intent["label"])
+    elif complexity["label"] == "small_llm":
+        return call_small_model(user_message)
+    else:
+        return call_large_model(user_message)
+```
+## Complexity Labels
+### `no_llm` — No LLM needed
+- Simple math: *"What is 42 * 7?"*
+- Unit conversion: *"Convert 100km to miles"*
+- Factual lookup: *"What is the capital of Japan?"*
+- Date/time: *"What day is March 15 2026?"*
+- Simple commands: *"Set a timer for 5 minutes"*
+### `small_llm` — 1–3B model sufficient
+- Short summarization: *"Summarize this paragraph..."*
+- Basic explanation: *"Explain recursion to a 10-year-old"*
+- Simple code: *"Write a Python function to reverse a string"*
+- Short generation: *"Write a one-line bio for a software engineer"*
+- Simple classification: *"Is this email spam?"*
+### `large_llm` — 7B+ / frontier model required
+- Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*
+- Long-form writing: *"Write a 1000-word blog post about quantum computing"*
+- Complex code: *"Build a REST API with auth, error handling, and tests"*
+- Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*
+- System design: *"Design a distributed caching system with eventual consistency"*
+## Performance
+- **Inference speed**: ~10ms on CPU, ~2ms on GPU
+- **Model size**: ~260MB (DistilBERT-base)
+### Evaluation Results
+Results on held-out test set:
+| Metric | Score |
+|---|---|
+| Accuracy | ~0.99 |
+| F1 (weighted) | ~0.99 |
+Per-class performance:
+| Class | Precision | Recall | F1 |
+|---|---|---|---|
+| no_llm | ~1.00 | ~1.00 | ~1.00 |
+| small_llm | ~0.98 | ~0.98 | ~0.98 |
+| large_llm | ~0.99 | ~0.99 | ~0.99 |
+> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
+## Training Details
+- **Base model**: distilbert-base-uncased
+- **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
+- **Epochs**: 5 (with early stopping, patience=2)
+- **Learning rate**: 2e-5
+- **Batch size**: 32
+- **Max sequence length**: 128
+## Use in Agent Pipelines
+```python
+COMPLEXITY_THRESHOLDS = {
+    "no_llm": 0.7,
+    "small_llm": 0.6,
+    "large_llm": 0.6,
+}
+def smart_route(message: str):
+    result = router(message)[0]
+    label, score = result["label"], result["score"]
+    if score < COMPLEXITY_THRESHOLDS[label]:
+        # Low confidence — default to large_llm for safety
+        label = "large_llm"
+    return label
+```
+## Limitations
+- Trained on English text only
+- Template-generated data may not cover all edge cases
+- Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence — use threshold fallback
+- Complexity is query-level only; does not account for context window length or domain expertise needed
+## Related Models
+- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) — companion intent classifier (8 categories, ~10ms on CPU)
+## License
+Apache 2.0 — use it however you want, commercial included.
+## Citation
+If this helps you, a star is appreciated!