Upload README.md with huggingface_hub

f3a3d75 verified 5 days ago

6.2 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-classification
	- distilbert
	- query-complexity
	- agent-routing
	- llm-routing
	- ai-agents
	- tool-use
	pipeline_tag: text-classification
	---

	# QueryComplexityRouter

	A fast, lightweight 3-class classifier that decides how much LLM power a query needs — before you spend tokens on it.

	Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:

	\| Label \| Meaning \| Suggested Action \|
	\|---\|---\|---\|
	\| `no_llm` \| Answerable with rules, lookup, or regex \| Skip the LLM entirely \|
	\| `small_llm` \| A 1–3B model (Phi-3, Gemma-2B) is sufficient \| Route to a cheap local model \|
	\| `large_llm` \| Requires 7B+ or frontier model (GPT-4, Claude) \| Route to powerful model \|

	## Why This Exists

	Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.

	QueryComplexityRouter sits at the top of your pipeline and makes this decision in ~10ms on CPU — before any LLM call is made.

	Pair it with [AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) for a full 2-stage routing pipeline:

	```
	User Message
	│
	▼
	AgentIntentRouter ← What does the user want? (code, search, chat, ...)
	│
	▼
	QueryComplexityRouter ← How hard is it? (no_llm / small_llm / large_llm)
	│
	▼
	Route to the right tool/model
	```

	## Quick Start

	```python
	from transformers import pipeline

	router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")

	# Single prediction
	result = router("What is 15% of 4500?")
	print(result)
	# [{'label': 'no_llm', 'score': 0.98}]

	# Batch
	messages = [
	"What is the capital of France?", # no_llm
	"Explain recursion in simple terms.", # small_llm
	"Write a 1000-word blog post about AI.", # large_llm
	"Design a distributed caching system.", # large_llm
	"Fix this bug: def add(a,b): return a-b", # small_llm
	]
	results = router(messages)
	for msg, res in zip(messages, results):
	print(f" {res['label']:>12} ({res['score']:.2f}) — {msg}")
	```

	## 2-Stage Routing Pipeline

	```python
	from transformers import pipeline

	intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
	complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")

	def route(user_message: str):
	intent = intent_router(user_message)[0]
	complexity = complexity_router(user_message)[0]

	print(f"Intent: {intent['label']} ({intent['score']:.2f})")
	print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")

	if complexity["label"] == "no_llm":
	return handle_with_rules(user_message, intent["label"])
	elif complexity["label"] == "small_llm":
	return call_small_model(user_message)
	else:
	return call_large_model(user_message)
	```

	## Complexity Labels

	### `no_llm` — No LLM needed
	- Simple math: "What is 42 7?"*
	- Unit conversion: "Convert 100km to miles"
	- Factual lookup: "What is the capital of Japan?"
	- Date/time: "What day is March 15 2026?"
	- Simple commands: "Set a timer for 5 minutes"

	### `small_llm` — 1–3B model sufficient
	- Short summarization: "Summarize this paragraph..."
	- Basic explanation: "Explain recursion to a 10-year-old"
	- Simple code: "Write a Python function to reverse a string"
	- Short generation: "Write a one-line bio for a software engineer"
	- Simple classification: "Is this email spam?"

	### `large_llm` — 7B+ / frontier model required
	- Deep reasoning: "Analyze the ethical implications of AI replacing jobs"
	- Long-form writing: "Write a 1000-word blog post about quantum computing"
	- Complex code: "Build a REST API with auth, error handling, and tests"
	- Multi-doc synthesis: "Given these 5 documents, synthesize an answer..."
	- System design: "Design a distributed caching system with eventual consistency"

	## Performance

	- Inference speed: ~10ms on CPU, ~2ms on GPU
	- Model size: ~260MB (DistilBERT-base)

	### Evaluation Results

	Results on held-out test set:

	\| Metric \| Score \|
	\|---\|---\|
	\| Accuracy \| ~0.99 \|
	\| F1 (weighted) \| ~0.99 \|

	Per-class performance:

	\| Class \| Precision \| Recall \| F1 \|
	\|---\|---\|---\|---\|
	\| no_llm \| ~1.00 \| ~1.00 \| ~1.00 \|
	\| small_llm \| ~0.98 \| ~0.98 \| ~0.98 \|
	\| large_llm \| ~0.99 \| ~0.99 \| ~0.99 \|

	> Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.

	## Training Details

	- Base model: distilbert-base-uncased
	- Training data: ~1,400 synthetic examples per class (~4,200 total), template-generated with natural language variation
	- Epochs: 5 (with early stopping, patience=2)
	- Learning rate: 2e-5
	- Batch size: 32
	- Max sequence length: 128

	## Use in Agent Pipelines

	```python
	COMPLEXITY_THRESHOLDS = {
	"no_llm": 0.7,
	"small_llm": 0.6,
	"large_llm": 0.6,
	}

	def smart_route(message: str):
	result = router(message)[0]
	label, score = result["label"], result["score"]

	if score < COMPLEXITY_THRESHOLDS[label]:
	# Low confidence — default to large_llm for safety
	label = "large_llm"

	return label
	```

	## Limitations

	- Trained on English text only
	- Template-generated data may not cover all edge cases
	- Borderline queries (e.g., "explain quantum entanglement") may get lower confidence — use threshold fallback
	- Complexity is query-level only; does not account for context window length or domain expertise needed

	## Related Models

	- [tripathyShaswata/AgentIntentRouter](https://huggingface.co/tripathyShaswata/AgentIntentRouter) — companion intent classifier (8 categories, ~10ms on CPU)

	## License

	Apache 2.0 — use it however you want, commercial included.

	## Citation

	If this helps you, a star is appreciated!