tripathyShaswata commited on
Commit
f3a3d75
Β·
verified Β·
1 Parent(s): 682d991

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +190 -0
README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-classification
7
+ - distilbert
8
+ - query-complexity
9
+ - agent-routing
10
+ - llm-routing
11
+ - ai-agents
12
+ - tool-use
13
+ pipeline_tag: text-classification
14
+ ---
15
+
16
+ # QueryComplexityRouter
17
+
18
+ A fast, lightweight 3-class classifier that decides **how much LLM power a query needs** β€” before you spend tokens on it.
19
+
20
+ Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:
21
+
22
+ | Label | Meaning | Suggested Action |
23
+ |---|---|---|
24
+ | `no_llm` | Answerable with rules, lookup, or regex | Skip the LLM entirely |
25
+ | `small_llm` | A 1–3B model (Phi-3, Gemma-2B) is sufficient | Route to a cheap local model |
26
+ | `large_llm` | Requires 7B+ or frontier model (GPT-4, Claude) | Route to powerful model |
27
+
28
+ ## Why This Exists
29
+
30
+ Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.
31
+
32
+ **QueryComplexityRouter** sits at the top of your pipeline and makes this decision in **~10ms on CPU** β€” before any LLM call is made.
33
+
34
+ Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:
35
+
36
+ ```
37
+ User Message
38
+ β”‚
39
+ β–Ό
40
+ AgentIntentRouter ← What does the user want? (code, search, chat, ...)
41
+ β”‚
42
+ β–Ό
43
+ QueryComplexityRouter ← How hard is it? (no_llm / small_llm / large_llm)
44
+ β”‚
45
+ β–Ό
46
+ Route to the right tool/model
47
+ ```
48
+
49
+ ## Quick Start
50
+
51
+ ```python
52
+ from transformers import pipeline
53
+
54
+ router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
55
+
56
+ # Single prediction
57
+ result = router("What is 15% of 4500?")
58
+ print(result)
59
+ # [{'label': 'no_llm', 'score': 0.98}]
60
+
61
+ # Batch
62
+ messages = [
63
+ "What is the capital of France?", # no_llm
64
+ "Explain recursion in simple terms.", # small_llm
65
+ "Write a 1000-word blog post about AI.", # large_llm
66
+ "Design a distributed caching system.", # large_llm
67
+ "Fix this bug: def add(a,b): return a-b", # small_llm
68
+ ]
69
+ results = router(messages)
70
+ for msg, res in zip(messages, results):
71
+ print(f" {res['label']:>12} ({res['score']:.2f}) β€” {msg}")
72
+ ```
73
+
74
+ ## 2-Stage Routing Pipeline
75
+
76
+ ```python
77
+ from transformers import pipeline
78
+
79
+ intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
80
+ complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")
81
+
82
+ def route(user_message: str):
83
+ intent = intent_router(user_message)[0]
84
+ complexity = complexity_router(user_message)[0]
85
+
86
+ print(f"Intent: {intent['label']} ({intent['score']:.2f})")
87
+ print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")
88
+
89
+ if complexity["label"] == "no_llm":
90
+ return handle_with_rules(user_message, intent["label"])
91
+ elif complexity["label"] == "small_llm":
92
+ return call_small_model(user_message)
93
+ else:
94
+ return call_large_model(user_message)
95
+ ```
96
+
97
+ ## Complexity Labels
98
+
99
+ ### `no_llm` β€” No LLM needed
100
+ - Simple math: *"What is 42 * 7?"*
101
+ - Unit conversion: *"Convert 100km to miles"*
102
+ - Factual lookup: *"What is the capital of Japan?"*
103
+ - Date/time: *"What day is March 15 2026?"*
104
+ - Simple commands: *"Set a timer for 5 minutes"*
105
+
106
+ ### `small_llm` β€” 1–3B model sufficient
107
+ - Short summarization: *"Summarize this paragraph..."*
108
+ - Basic explanation: *"Explain recursion to a 10-year-old"*
109
+ - Simple code: *"Write a Python function to reverse a string"*
110
+ - Short generation: *"Write a one-line bio for a software engineer"*
111
+ - Simple classification: *"Is this email spam?"*
112
+
113
+ ### `large_llm` β€” 7B+ / frontier model required
114
+ - Deep reasoning: *"Analyze the ethical implications of AI replacing jobs"*
115
+ - Long-form writing: *"Write a 1000-word blog post about quantum computing"*
116
+ - Complex code: *"Build a REST API with auth, error handling, and tests"*
117
+ - Multi-doc synthesis: *"Given these 5 documents, synthesize an answer..."*
118
+ - System design: *"Design a distributed caching system with eventual consistency"*
119
+
120
+ ## Performance
121
+
122
+ - **Inference speed**: ~10ms on CPU, ~2ms on GPU
123
+ - **Model size**: ~260MB (DistilBERT-base)
124
+
125
+ ### Evaluation Results
126
+
127
+ Results on held-out test set:
128
+
129
+ | Metric | Score |
130
+ |---|---|
131
+ | Accuracy | ~0.99 |
132
+ | F1 (weighted) | ~0.99 |
133
+
134
+ Per-class performance:
135
+
136
+ | Class | Precision | Recall | F1 |
137
+ |---|---|---|---|
138
+ | no_llm | ~1.00 | ~1.00 | ~1.00 |
139
+ | small_llm | ~0.98 | ~0.98 | ~0.98 |
140
+ | large_llm | ~0.99 | ~0.99 | ~0.99 |
141
+
142
+ > Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
143
+
144
+ ## Training Details
145
+
146
+ - **Base model**: distilbert-base-uncased
147
+ - **Training data**: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
148
+ - **Epochs**: 5 (with early stopping, patience=2)
149
+ - **Learning rate**: 2e-5
150
+ - **Batch size**: 32
151
+ - **Max sequence length**: 128
152
+
153
+ ## Use in Agent Pipelines
154
+
155
+ ```python
156
+ COMPLEXITY_THRESHOLDS = {
157
+ "no_llm": 0.7,
158
+ "small_llm": 0.6,
159
+ "large_llm": 0.6,
160
+ }
161
+
162
+ def smart_route(message: str):
163
+ result = router(message)[0]
164
+ label, score = result["label"], result["score"]
165
+
166
+ if score < COMPLEXITY_THRESHOLDS[label]:
167
+ # Low confidence β€” default to large_llm for safety
168
+ label = "large_llm"
169
+
170
+ return label
171
+ ```
172
+
173
+ ## Limitations
174
+
175
+ - Trained on English text only
176
+ - Template-generated data may not cover all edge cases
177
+ - Borderline queries (e.g., *"explain quantum entanglement"*) may get lower confidence β€” use threshold fallback
178
+ - Complexity is query-level only; does not account for context window length or domain expertise needed
179
+
180
+ ## Related Models
181
+
182
+ - [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) β€” companion intent classifier (8 categories, ~10ms on CPU)
183
+
184
+ ## License
185
+
186
+ Apache 2.0 β€” use it however you want, commercial included.
187
+
188
+ ## Citation
189
+
190
+ If this helps you, a star is appreciated!