File size: 6,197 Bytes

f3a3d75

---

license: apache-2.0
language:
- en
tags:
- text-classification
- distilbert
- query-complexity
- agent-routing
- llm-routing
- ai-agents
- tool-use
pipeline_tag: text-classification
---


# QueryComplexityRouter

A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** — before you spend tokens on it.

Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:

| Label | Meaning | Suggested Action |
|---|---|---|
| `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
| `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
| `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |

## Why This Exists

Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.

**QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** — before any LLM call is made.

Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:

```

User Message

    │

    ▼

AgentIntentRouter          ← What does the user want? (code, search, chat, ...)

    │

    ▼

QueryComplexityRouter      ← How hard is it? (no_llm / small_llm / large_llm)

    │

    ▼

Route to the right tool/model

```

## Quick Start

```python

from transformers import pipeline



router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")



# Single prediction

result = router("What is 15% of 4500?")

print(result)

# [{'label': 'no_llm', 'score': 0.98}]



# Batch

messages = [

    "What is the capital of France?",           # no_llm

    "Explain recursion in simple terms.",        # small_llm

    "Write a 1000-word blog post about AI.",     # large_llm

    "Design a distributed caching system.",      # large_llm

    "Fix this bug: def add(a,b): return a-b",   # small_llm

]

results = router(messages)

for msg, res in zip(messages, results):

    print(f"  {res['label']:>12} ({res['score']:.2f}) — {msg}")

```

## 2-Stage Routing Pipeline

```python

from transformers import pipeline



intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")

complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")



def route(user_message: str):

    intent = intent_router(user_message)[0]

    complexity = complexity_router(user_message)[0]



    print(f"Intent:     {intent['label']} ({intent['score']:.2f})")

    print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")



    if complexity["label"] == "no_llm":

        return handle_with_rules(user_message, intent["label"])

    elif complexity["label"] == "small_llm":

        return call_small_model(user_message)

    else:

        return call_large_model(user_message)

```

## Complexity Labels

### `no_llm` — No LLM needed

- Simple math: *"What is 42 * 7?"*

- Unit conversion: *"Convert 100km to miles"*

- Factual lookup: *"What is the capital of Japan?"*

- Date/time: *"What day is March 15 2026?"*

- Simple commands: *"Set a timer for 5 minutes"*



### `small_llm` — 1–3B model sufficient
- Short summarization: *"Summarize this paragraph..."*
- Basic explanation: *"Explain recursion to a 10-year-old"*
- Simple code: *"Write a Python function to reverse a string"*
- Short generation: *"Write a one-line bio for a software engineer"*
- Simple classification: *"Is this email spam?"*

### `large_llm` — 7B+ / frontier model required

- Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*

- Long-form writing: *"Write a 1000-word blog post about quantum computing"*

- Complex code: *"Build a REST API with auth, error handling, and tests"*

- Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*

- System design: *"Design a distributed caching system with eventual consistency"*



## Performance



- **Inference speed**: ~10ms on CPU, ~2ms on GPU

- **Model size**: ~260MB (DistilBERT-base)



### Evaluation Results



Results on held-out test set:



| Metric | Score |

|---|---|

| Accuracy | ~0.99 |

| F1 (weighted) | ~0.99 |



Per-class performance:



| Class | Precision | Recall | F1 |

|---|---|---|---|

| no_llm | ~1.00 | ~1.00 | ~1.00 |
| small_llm | ~0.98 | ~0.98 | ~0.98 |

| large_llm | ~0.99 | ~0.99 | ~0.99 |

> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.

## Training Details

- **Base model**: distilbert-base-uncased
- **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
- **Epochs**: 5 (with early stopping, patience=2)
- **Learning rate**: 2e-5
- **Batch size**: 32
- **Max sequence length**: 128

## Use in Agent Pipelines

```python

COMPLEXITY_THRESHOLDS = {

    "no_llm": 0.7,

    "small_llm": 0.6,

    "large_llm": 0.6,

}



def smart_route(message: str):

    result = router(message)[0]

    label, score = result["label"], result["score"]



    if score < COMPLEXITY_THRESHOLDS[label]:

        # Low confidence — default to large_llm for safety

        label = "large_llm"



    return label

```

## Limitations

- Trained on English text only
- Template-generated data may not cover all edge cases
- Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence — use threshold fallback
- Complexity is query-level only; does not account for context window length or domain expertise needed

## Related Models

- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) — companion intent classifier (8 categories, ~10ms on CPU)

## License

Apache 2.0 — use it however you want, commercial included.

## Citation

If this helps you, a star is appreciated!