--- license: apache-2.0 language: - en tags: - text-classification - distilbert - query-complexity - agent-routing - llm-routing - ai-agents - tool-use pipeline_tag: text-classification --- # QueryComplexityRouter A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** — before you spend tokens on it. Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers: | Label | Meaning | Suggested Action | |---|---|---| | `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely | | `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model | | `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model | ## Why This Exists Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model. **QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** — before any LLM call is made. Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline: ``` User Message │ ▼ AgentIntentRouter ← What does the user want? (code, search, chat, ...) │ ▼ QueryComplexityRouter ← How hard is it? (no_llm / small_llm / large_llm) │ ▼ Route to the right tool/model ``` ## Quick Start ```python from transformers import pipeline router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter") # Single prediction result = router("What is 15% of 4500?") print(result) # [{'label': 'no_llm', 'score': 0.98}] # Batch messages = [ "What is the capital of France?", # no_llm "Explain recursion in simple terms.", # small_llm "Write a 1000-word blog post about AI.", # large_llm "Design a distributed caching system.", # large_llm "Fix this bug: def add(a,b): return a-b", # small_llm ] results = router(messages) for msg, res in zip(messages, results): print(f" {res['label']:>12} ({res['score']:.2f}) — {msg}") ``` ## 2-Stage Routing Pipeline ```python from transformers import pipeline intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter") complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter") def route(user_message: str): intent = intent_router(user_message)[0] complexity = complexity_router(user_message)[0] print(f"Intent: {intent['label']} ({intent['score']:.2f})") print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})") if complexity["label"] == "no_llm": return handle_with_rules(user_message, intent["label"]) elif complexity["label"] == "small_llm": return call_small_model(user_message) else: return call_large_model(user_message) ``` ## Complexity Labels ### `no_llm` — No LLM needed - Simple math: *"What is 42 * 7?"* - Unit conversion: *"Convert 100km to miles"* - Factual lookup: *"What is the capital of Japan?"* - Date/time: *"What day is March 15 2026?"* - Simple commands: *"Set a timer for 5 minutes"* ### `small_llm` — 1–3B model sufficient - Short summarization: *"Summarize this paragraph..."* - Basic explanation: *"Explain recursion to a 10-year-old"* - Simple code: *"Write a Python function to reverse a string"* - Short generation: *"Write a one-line bio for a software engineer"* - Simple classification: *"Is this email spam?"* ### `large_llm` — 7B+ / frontier model required - Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"* - Long-form writing: *"Write a 1000-word blog post about quantum computing"* - Complex code: *"Build a REST API with auth, error handling, and tests"* - Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."* - System design: *"Design a distributed caching system with eventual consistency"* ## Performance - **Inference speed**: ~10ms on CPU, ~2ms on GPU - **Model size**: ~260MB (DistilBERT-base) ### Evaluation Results Results on held-out test set: | Metric | Score | |---|---| | Accuracy | ~0.99 | | F1 (weighted) | ~0.99 | Per-class performance: | Class | Precision | Recall | F1 | |---|---|---|---| | no_llm | ~1.00 | ~1.00 | ~1.00 | | small_llm | ~0.98 | ~0.98 | ~0.98 | | large_llm | ~0.99 | ~0.99 | ~0.99 | > Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully. ## Training Details - **Base model**: distilbert-base-uncased - **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation - **Epochs**: 5 (with early stopping, patience=2) - **Learning rate**: 2e-5 - **Batch size**: 32 - **Max sequence length**: 128 ## Use in Agent Pipelines ```python COMPLEXITY_THRESHOLDS = { "no_llm": 0.7, "small_llm": 0.6, "large_llm": 0.6, } def smart_route(message: str): result = router(message)[0] label, score = result["label"], result["score"] if score < COMPLEXITY_THRESHOLDS[label]: # Low confidence — default to large_llm for safety label = "large_llm" return label ``` ## Limitations - Trained on English text only - Template-generated data may not cover all edge cases - Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence — use threshold fallback - Complexity is query-level only; does not account for context window length or domain expertise needed ## Related Models - [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) — companion intent classifier (8 categories, ~10ms on CPU) ## License Apache 2.0 — use it however you want, commercial included. ## Citation If this helps you, a star is appreciated!