Upload README.md with huggingface_hub

f3a3d75 verified 5 days ago

6.2 kB

license: apache-2.0
language:
  - en
tags:
  - text-classification
  - distilbert
  - query-complexity
  - agent-routing
  - llm-routing
  - ai-agents
  - tool-use
pipeline_tag: text-classification

QueryComplexityRouter

A fast, lightweight 3-class classifier that decides how much LLM power a query needs — before you spend tokens on it.

Built on DistilBERT (66M params), fine-tuned to classify any user message into one of three complexity tiers:

Label	Meaning	Suggested Action
`no_llm`	Answerable with rules, lookup, or regex	Skip the LLM entirely
`small_llm`	A 1–3B model (Phi-3, Gemma-2B) is sufficient	Route to a cheap local model
`large_llm`	Requires 7B+ or frontier model (GPT-4, Claude)	Route to powerful model

Why This Exists

Running every query through a frontier LLM is expensive and slow. But you also don't want to under-serve complex queries with a tiny model.

QueryComplexityRouter sits at the top of your pipeline and makes this decision in ~10ms on CPU — before any LLM call is made.

Pair it with AgentIntentRouter for a full 2-stage routing pipeline:

User Message
    │
    ▼
AgentIntentRouter          ← What does the user want? (code, search, chat, ...)
    │
    ▼
QueryComplexityRouter      ← How hard is it? (no_llm / small_llm / large_llm)
    │
    ▼
Route to the right tool/model

Quick Start

from transformers import pipeline

router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")

# Single prediction
result = router("What is 15% of 4500?")
print(result)
# [{'label': 'no_llm', 'score': 0.98}]

# Batch
messages = [
    "What is the capital of France?",           # no_llm
    "Explain recursion in simple terms.",        # small_llm
    "Write a 1000-word blog post about AI.",     # large_llm
    "Design a distributed caching system.",      # large_llm
    "Fix this bug: def add(a,b): return a-b",   # small_llm
]
results = router(messages)
for msg, res in zip(messages, results):
    print(f"  {res['label']:>12} ({res['score']:.2f}) — {msg}")

2-Stage Routing Pipeline

from transformers import pipeline

intent_router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
complexity_router = pipeline("text-classification", model="tripathyShaswata/QueryComplexityRouter")

def route(user_message: str):
    intent = intent_router(user_message)[0]
    complexity = complexity_router(user_message)[0]

    print(f"Intent:     {intent['label']} ({intent['score']:.2f})")
    print(f"Complexity: {complexity['label']} ({complexity['score']:.2f})")

    if complexity["label"] == "no_llm":
        return handle_with_rules(user_message, intent["label"])
    elif complexity["label"] == "small_llm":
        return call_small_model(user_message)
    else:
        return call_large_model(user_message)

Complexity Labels

`no_llm` — No LLM needed

Simple math: "What is 42 * 7?"
Unit conversion: "Convert 100km to miles"
Factual lookup: "What is the capital of Japan?"
Date/time: "What day is March 15 2026?"
Simple commands: "Set a timer for 5 minutes"

`small_llm` — 1–3B model sufficient

Short summarization: "Summarize this paragraph..."
Basic explanation: "Explain recursion to a 10-year-old"
Simple code: "Write a Python function to reverse a string"
Short generation: "Write a one-line bio for a software engineer"
Simple classification: "Is this email spam?"

`large_llm` — 7B+ / frontier model required

Deep reasoning: "Analyze the ethical implications of AI replacing jobs"
Long-form writing: "Write a 1000-word blog post about quantum computing"
Complex code: "Build a REST API with auth, error handling, and tests"
Multi-doc synthesis: "Given these 5 documents, synthesize an answer..."
System design: "Design a distributed caching system with eventual consistency"

Performance

Inference speed: ~10ms on CPU, ~2ms on GPU
Model size: ~260MB (DistilBERT-base)

Evaluation Results

Results on held-out test set:

Metric	Score
Accuracy	~0.99
F1 (weighted)	~0.99

Per-class performance:

Class	Precision	Recall	F1
no_llm	~1.00	~1.00	~1.00
small_llm	~0.98	~0.98	~0.98
large_llm	~0.99	~0.99	~0.99

Note: Results on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.

Training Details

Base model: distilbert-base-uncased
Training data: ~~1,400 synthetic examples per class (~~4,200 total), template-generated with natural language variation
Epochs: 5 (with early stopping, patience=2)
Learning rate: 2e-5
Batch size: 32
Max sequence length: 128

Use in Agent Pipelines

COMPLEXITY_THRESHOLDS = {
    "no_llm": 0.7,
    "small_llm": 0.6,
    "large_llm": 0.6,
}

def smart_route(message: str):
    result = router(message)[0]
    label, score = result["label"], result["score"]

    if score < COMPLEXITY_THRESHOLDS[label]:
        # Low confidence — default to large_llm for safety
        label = "large_llm"

    return label

Limitations

Trained on English text only
Template-generated data may not cover all edge cases
Borderline queries (e.g., "explain quantum entanglement") may get lower confidence — use threshold fallback
Complexity is query-level only; does not account for context window length or domain expertise needed

Related Models

tripathyShaswata/AgentIntentRouter — companion intent classifier (8 categories, ~10ms on CPU)

License

Apache 2.0 — use it however you want, commercial included.

Citation

If this helps you, a star is appreciated!