| ---
|
| license: apache-2.0
|
| language:
|
| - en
|
| tags:
|
| - text-classification
|
| - distilbert
|
| - query-complexity
|
| - agent-routing
|
| - llm-routing
|
| - ai-agents
|
| - tool-use
|
| pipeline_tag: text-classification
|
| ---
|
|
|
| # QueryComplexityRouter
|
|
|
| A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** β before you spend tokens on it.
|
|
|
| Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
|
|
|
| | Label | Meaning | Suggested Action |
|
| |---|---|---|
|
| | `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
|
| | `small_llm` | A 1β3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
|
| | `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
|
|
|
| ## Why This Exists
|
|
|
| Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
|
|
|
| **QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** β before any LLM call is made.
|
|
|
| Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:
|
|
|
| ```
|
| User Message
|
| β
|
| βΌ
|
| AgentIntentRouter β What does the user want? (code, search, chat, ...)
|
| β
|
| βΌ
|
| QueryComplexityRouter β How hard is it? (no_llm / small_llm / large_llm)
|
| β
|
| βΌ
|
| Route to the right tool/model
|
| ```
|
|
|
| ## Quick Start
|
|
|
| ```python
|
| from transformers import pipeline
|
|
|
| router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
|
|
|
| # Single prediction
|
| result = router("What is 15% of 4500?")
|
| print(result)
|
| # [{'label': 'no_llm', 'score': 0.98}]
|
|
|
| # Batch
|
| messages = [
|
| "What is the capital of France?", # no_llm
|
| "Explain recursion in simple terms.", # small_llm
|
| "Write a 1000-word blog post about AI.", # large_llm
|
| "Design a distributed caching system.", # large_llm
|
| "Fix this bug: def add(a,b): return a-b", # small_llm
|
| ]
|
| results = router(messages)
|
| for msg, res in zip(messages, results):
|
| print(f" {res['label']:>12} ({res['score']:.2f}) β {msg}")
|
| ```
|
|
|
| ## 2-Stage Routing Pipeline
|
|
|
| ```python
|
| from transformers import pipeline
|
|
|
| intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
|
| complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
|
|
|
| def route(user_message: str):
|
| intent = intent_router(user_message)[0]
|
| complexity = complexity_router(user_message)[0]
|
|
|
| print(f"Intent: {intent['label']} ({intent['score']:.2f})")
|
| print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
|
|
|
| if complexity["label"] == "no_llm":
|
| return handle_with_rules(user_message, intent["label"])
|
| elif complexity["label"] == "small_llm":
|
| return call_small_model(user_message)
|
| else:
|
| return call_large_model(user_message)
|
| ```
|
|
|
| ## Complexity Labels
|
|
|
| ### `no_llm` β No LLM needed
|
| - Simple math: *"What is 42 * 7?"*
|
| - Unit conversion: *"Convert 100km to miles"*
|
| - Factual lookup: *"What is the capital of Japan?"*
|
| - Date/time: *"What day is March 15 2026?"*
|
| - Simple commands: *"Set a timer for 5 minutes"*
|
|
|
| ### `small_llm` β 1β3B model sufficient
|
| - Short summarization: *"Summarize this paragraph..."*
|
| - Basic explanation: *"Explain recursion to a 10-year-old"*
|
| - Simple code: *"Write a Python function to reverse a string"*
|
| - Short generation: *"Write a one-line bio for a software engineer"*
|
| - Simple classification: *"Is this email spam?"*
|
|
|
| ### `large_llm` β 7B+ / frontier model required
|
| - Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*
|
| - Long-form writing: *"Write a 1000-word blog post about quantum computing"*
|
| - Complex code: *"Build a REST API with auth, error handling, and tests"*
|
| - Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*
|
| - System design: *"Design a distributed caching system with eventual consistency"*
|
|
|
| ## Performance
|
|
|
| - **Inference speed**: ~10ms on CPU, ~2ms on GPU
|
| - **Model size**: ~260MB (DistilBERT-base)
|
|
|
| ### Evaluation Results
|
|
|
| Results on held-out test set:
|
|
|
| | Metric | Score |
|
| |---|---|
|
| | Accuracy | ~0.99 |
|
| | F1 (weighted) | ~0.99 |
|
|
|
| Per-class performance:
|
|
|
| | Class | Precision | Recall | F1 |
|
| |---|---|---|---|
|
| | no_llm | ~1.00 | ~1.00 | ~1.00 |
|
| | small_llm | ~0.98 | ~0.98 | ~0.98 |
|
| | large_llm | ~0.99 | ~0.99 | ~0.99 |
|
|
|
| > Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
|
|
|
| ## Training Details
|
|
|
| - **Base model**: distilbert-base-uncased
|
| - **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
|
| - **Epochs**: 5 (with early stopping, patience=2)
|
| - **Learning rate**: 2e-5
|
| - **Batch size**: 32
|
| - **Max sequence length**: 128
|
|
|
| ## Use in Agent Pipelines
|
|
|
| ```python
|
| COMPLEXITY_THRESHOLDS = {
|
| "no_llm": 0.7,
|
| "small_llm": 0.6,
|
| "large_llm": 0.6,
|
| }
|
|
|
| def smart_route(message: str):
|
| result = router(message)[0]
|
| label, score = result["label"], result["score"]
|
|
|
| if score < COMPLEXITY_THRESHOLDS[label]:
|
| # Low confidence β default to large_llm for safety
|
| label = "large_llm"
|
|
|
| return label
|
| ```
|
|
|
| ## Limitations
|
|
|
| - Trained on English text only
|
| - Template-generated data may not cover all edge cases
|
| - Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence β use threshold fallback
|
| - Complexity is query-level only; does not account for context window length or domain expertise needed
|
|
|
| ## Related Models
|
|
|
| - [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) β companion intent classifier (8 categories, ~10ms on CPU)
|
|
|
| ## License
|
|
|
| Apache 2.0 β use it however you want, commercial included.
|
|
|
| ## Citation
|
|
|
| If this helps you, a star is appreciated!
|
|
|