anasnassar
/

llm-query-complexity-classifier

 ---
 license: apache-2.0
+base_model: answerdotai/ModernBERT-base
+language:
+  - en
+tags:
+  - text-classification
+  - llm-routing
+  - query-complexity
+  - knowledge-distillation
+  - research-computing
+pipeline_tag: text-classification
 ---
+# LLM Query Complexity Classifier
+Fine-tuned [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) (149M parameters) for three-class query complexity classification: **LOW**, **MEDIUM**, or **HIGH**.
+Built for the [STREAM](https://github.com/uicacer/STREAM) project (Smart Tiered Routing Engine for AI Models) to route queries automatically to the most cost-effective inference tier — local CPU, HPC GPU, or cloud API — at ~15ms per query with no API dependency.
+## What It Does
+Given a user query, the model predicts how much reasoning depth is required to answer it:
+| Label | Definition | Example |
+|-------|------------|---------|
+| `LOW` | Single retrievable fact. Answer statable in one sentence, no reasoning chain. | "What is the capital of France?" |
+| `MEDIUM` | Apply an established procedure or assemble 2–4 concepts. Textbook-level reasoning. | "Explain quicksort and analyze its time complexity." |
+| `HIGH` | Construct a novel reasoning path or expert judgment. No standard procedure. | "Is P equal to NP? Present the current state of evidence." |
+**Key design principle**: complexity is defined by *reasoning depth*, not question format. "What is X?" can be LOW, MEDIUM, or HIGH depending on what reasoning is required to answer.
+## Usage
+```python
+from transformers import pipeline
+clf = pipeline(
+    "text-classification",
+    model="anasnassar/llm-query-complexity-classifier",
+    device=-1,      # CPU
+    top_k=None,     # return all class scores
+)
+result = clf("Explain the difference between TCP and UDP")
+# [{'label': 'MEDIUM', 'score': 0.82}, {'label': 'LOW', 'score': 0.11}, {'label': 'HIGH', 'score': 0.07}]
+complexity = max(result[0], key=lambda x: x["score"])["label"]
+# 'MEDIUM'
+```
+## Training
+**Knowledge distillation approach**: Claude Sonnet 4.6 (with extended thinking) labeled 6,912 queries across 6 domains and 3 complexity classes. ModernBERT-base was then fine-tuned on those labels. This is LLM-supervised fine-tuning — Claude generates hard labels; ModernBERT learns from them. The result runs at ~15ms per query with no API dependency.
+**Training dataset**: [anasnassar/llm-query-complexity-benchmark](https://huggingface.co/datasets/anasnassar/llm-query-complexity-benchmark) — 6,912 queries, 6 domains, balanced across complexity classes.
+**Hyperparameters**:
+| Parameter | Value |
+|-----------|-------|
+| Base model | answerdotai/ModernBERT-base |
+| Epochs | 5 |
+| Batch size | 32 |
+| Learning rate | 2e-5 |
+| Max sequence length | 128 tokens |
+| Optimizer | AdamW, weight_decay=0.01 |
+| Warmup | 10% of steps |
+| Best model metric | macro-F1 |
+## Evaluation
+Three evaluation strategies are used to address data leakage from LLM-generated near-duplicates:
+| Strategy | Description |
+|----------|-------------|
+| **Domain-held-out 6-fold CV** | Train on 5 domains, test on 6th. Primary reported metric. |
+| **Similarity-aware split** | Near-duplicate queries (cosine sim > 0.90) kept on same side of split. |
+| **Real-world (LMSYS Arena)** | Evaluated on real user prompts from Chatbot Arena — fully out-of-distribution. |
+*Note: Random train/test split on LLM-generated data yields inflated accuracy (~99%) due to near-duplicate phrasings. Domain-held-out and real-world numbers are the rigorous metrics.*
+Full evaluation code: [scripts/eval/](https://github.com/uicacer/STREAM/tree/main/scripts/eval)
+## Performance
+| Judge | Latency (p50) | Notes |
+|-------|--------------|-------|
+| ModernBERT (this model) | ~15ms | CPU, no API dependency |
+| Llama 3.2 3B (LLM judge) | ~390ms | Requires Ollama |
+26× latency reduction vs. the LLM judge baseline.
+## Integration in STREAM
+```python
+from stream.middleware.core.complexity_judge import judge_complexity
+result = judge_complexity("Explain quantum entanglement", strategy="modernbert")
+# JudgmentResult(complexity='medium', method='classifier', strategy_used='modernbert',
+#                scores={'low': 0.08, 'medium': 0.79, 'high': 0.13})
+```
+## Citation
+```bibtex
+@inproceedings{nassar2026stream,
+  title     = {{STREAM}: Multi-Tier {LLM} Inference Middleware with Dual-Channel {HPC} Token Streaming},
+  author    = {Nassar, Anas and Mohr, Steve and Apanasevich, Leonard and Sharma, Himanshu},
+  booktitle = {Practice and Experience in Advanced Research Computing (PEARC '26)},
+  year      = {2026}
+}
+```
+## License
+Apache 2.0