File size: 6,197 Bytes
f3a3d75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---

license: apache-2.0
language:
- en
tags:
- text-classification
- distilbert
- query-complexity
- agent-routing
- llm-routing
- ai-agents
- tool-use
pipeline_tag: text-classification
---


# QueryComplexityRouter

A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** β€” before you spend tokens on it.

Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:

| Label | Meaning | Suggested Action |
|---|---|---|
| `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
| `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
| `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |

## Why This Exists

Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.

**QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** β€” before any LLM call is made.

Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:

```

User Message

    β”‚

    β–Ό

AgentIntentRouter          ← What does the user want? (code, search, chat, ...)

    β”‚

    β–Ό

QueryComplexityRouter      ← How hard is it? (no_llm / small_llm / large_llm)

    β”‚

    β–Ό

Route to the right tool/model

```

## Quick Start

```python

from transformers import pipeline



router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")



# Single prediction

result = router("What is 15% of 4500?")

print(result)

# [{'label': 'no_llm', 'score': 0.98}]



# Batch

messages = [

    "What is the capital of France?",           # no_llm

    "Explain recursion in simple terms.",        # small_llm

    "Write a 1000-word blog post about AI.",     # large_llm

    "Design a distributed caching system.",      # large_llm

    "Fix this bug: def add(a,b): return a-b",   # small_llm

]

results = router(messages)

for msg, res in zip(messages, results):

    print(f"  {res['label']:>12} ({res['score']:.2f}) β€” {msg}")

```

## 2-Stage Routing Pipeline

```python

from transformers import pipeline



intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")

complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")



def route(user_message: str):

    intent = intent_router(user_message)[0]

    complexity = complexity_router(user_message)[0]



    print(f"Intent:     {intent['label']} ({intent['score']:.2f})")

    print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")



    if complexity["label"] == "no_llm":

        return handle_with_rules(user_message, intent["label"])

    elif complexity["label"] == "small_llm":

        return call_small_model(user_message)

    else:

        return call_large_model(user_message)

```

## Complexity Labels

### `no_llm` β€” No LLM needed

- Simple math: *"What is 42 * 7?"*

- Unit conversion: *"Convert 100km to miles"*

- Factual lookup: *"What is the capital of Japan?"*

- Date/time: *"What day is March 15 2026?"*

- Simple commands: *"Set a timer for 5 minutes"*



### `small_llm` β€” 1–3B model sufficient
- Short summarization: *"Summarize this paragraph..."*
- Basic explanation: *"Explain recursion to a 10-year-old"*
- Simple code: *"Write a Python function to reverse a string"*
- Short generation: *"Write a one-line bio for a software engineer"*
- Simple classification: *"Is this email spam?"*

### `large_llm` β€” 7B+ / frontier model required

- Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*

- Long-form writing: *"Write a 1000-word blog post about quantum computing"*

- Complex code: *"Build a REST API with auth, error handling, and tests"*

- Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*

- System design: *"Design a distributed caching system with eventual consistency"*



## Performance



- **Inference speed**: ~10ms on CPU, ~2ms on GPU

- **Model size**: ~260MB (DistilBERT-base)



### Evaluation Results



Results on held-out test set:



| Metric | Score |

|---|---|

| Accuracy | ~0.99 |

| F1 (weighted) | ~0.99 |



Per-class performance:



| Class | Precision | Recall | F1 |

|---|---|---|---|

| no_llm | ~1.00 | ~1.00 | ~1.00 |
| small_llm | ~0.98 | ~0.98 | ~0.98 |

| large_llm | ~0.99 | ~0.99 | ~0.99 |

> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.

## Training Details

- **Base model**: distilbert-base-uncased
- **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
- **Epochs**: 5 (with early stopping, patience=2)
- **Learning rate**: 2e-5
- **Batch size**: 32
- **Max sequence length**: 128

## Use in Agent Pipelines

```python

COMPLEXITY_THRESHOLDS = {

    "no_llm": 0.7,

    "small_llm": 0.6,

    "large_llm": 0.6,

}



def smart_route(message: str):

    result = router(message)[0]

    label, score = result["label"], result["score"]



    if score < COMPLEXITY_THRESHOLDS[label]:

        # Low confidence β€” default to large_llm for safety

        label = "large_llm"



    return label

```

## Limitations

- Trained on English text only
- Template-generated data may not cover all edge cases
- Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence β€” use threshold fallback
- Complexity is query-level only; does not account for context window length or domain expertise needed

## Related Models

- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) β€” companion intent classifier (8 categories, ~10ms on CPU)

## License

Apache 2.0 β€” use it however you want, commercial included.

## Citation

If this helps you, a star is appreciated!