|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- regression |
|
|
- prompt |
|
|
- complexity-estimation |
|
|
- semantic-routing |
|
|
- llm-routing |
|
|
base_model: microsoft/deberta-v3-base |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# PromptComplexityEstimator |
|
|
|
|
|
A lightweight regressor that estimates the complexity of an LLM prompt on a scale between 0 and 1. |
|
|
|
|
|
- **Input:** a string prompt |
|
|
- **Output:** a scalar score in [0, 1] (higher = more complex) |
|
|
|
|
|
The model is designed primarily to be used as a core building block for semantic routing systems, especially LLM vs. SLM (Small Language Model) routers. |
|
|
Any router that aims to intelligently decide *which model should handle a request* needs a reliable signal for *how complex the request is*. This is the gap this model aims to close. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended use |
|
|
|
|
|
### Primary use case: LLM vs. SLM routing |
|
|
|
|
|
This model is intended to be used as part of a semantic router, where: |
|
|
- *Simple* prompts are handled by a **small / fast / cheap model** |
|
|
- *Complex* prompts are routed to a **large / capable / expensive model** |
|
|
|
|
|
The complexity score provides a learned signal for this decision. |
|
|
|
|
|
### Additional use cases |
|
|
- Prompt analytics and monitoring |
|
|
- Dataset stratification by difficulty |
|
|
- Adaptive compute allocation |
|
|
- Cost-aware or latency-aware inference pipelines |
|
|
|
|
|
### Not intended for |
|
|
- Safety classification, toxicity detection, or policy enforcement |
|
|
- Guaranteed difficulty estimation for a specific target model |
|
|
- Multimodal inputs or tool-augmented workflows (RAG/tools) |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
repo_id = "ilya-kolchinsky/PromptComplexityEstimator" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True) |
|
|
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval() |
|
|
|
|
|
prompt = "Design a distributed consensus protocol with Byzantine fault tolerance..." |
|
|
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
with torch.no_grad(): |
|
|
score = model(**inputs).logits.squeeze(-1).item() |
|
|
|
|
|
print(float(score)) |
|
|
``` |
|
|
|
|
|
|
|
|
### Example: Simple LLM vs. SLM routing |
|
|
|
|
|
```python |
|
|
THRESHOLD = 0.45 # chosen empirically |
|
|
|
|
|
def route_prompt(prompt: str) -> str: |
|
|
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True) |
|
|
with torch.no_grad(): |
|
|
complexity = model(**inputs).logits.squeeze(-1).item() |
|
|
|
|
|
return "LLM" if complexity > THRESHOLD else "SLM" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Model and Training Details |
|
|
|
|
|
### Datasets |
|
|
- Cross-Difficulty (https://huggingface.co/datasets/BatsResearch/Cross-Difficulty) |
|
|
- [Easy2Hard-Bench](https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench) |
|
|
- [MATH](https://huggingface.co/datasets/EleutherAI/hendrycks_math) |
|
|
- [ARC](https://huggingface.co/datasets/allenai/ai2_arc) |
|
|
- [RACE](https://huggingface.co/datasets/ehovy/race) |
|
|
- [ANLI (R1/R2/R3)](https://huggingface.co/datasets/facebook/anli) |
|
|
|
|
|
### Training Configuration |
|
|
- **Epochs:** 3 |
|
|
- **Batch Size:** 16 |
|
|
- **Loss:** huber |
|
|
- **Regressor Learning Rate:** 7.5e-5 |
|
|
- **Encoder Learning Rate:** 1.0e-5 |
|
|
- **Encoder Weight Decay:** 0.01 |
|
|
- **Optimizer**: AdamW |
|
|
- **Schedule**: Cosine (warmup_ratio=0.06) |
|
|
- **Dropout**: 0.1 |
|
|
|
|
|
### Model |
|
|
- **Backbone encoder:** microsoft/deberta-v3-base |
|
|
- Mask-aware **mean pooling** over token embeddings + **LayerNorm** |
|
|
- **Regression head:** Linear → ReLU → Linear → Sigmoid |
|
|
- **Max input length:** 512 tokens |
|
|
- The model outputs a bounded score in [0, 1]. In the examples below, the score is read from `outputs.logits` (shape `[batch, 1]`). |
|
|
|
|
|
|
|
|
Full training code and configuration are available at https://github.com/ilya-kolchinsky/ComplexityEstimator. |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance |
|
|
|
|
|
On the held-out evaluation set used during development, the released checkpoint achieved: |
|
|
|
|
|
- **MAE:** **0.0855** |
|
|
- **Spearman correlation:** **0.735** |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{kolchinsky_promptcomplexityestimator_2026, |
|
|
title = {PromptComplexityEstimator}, |
|
|
author = {Ilya Kolchinsky}, |
|
|
year = {2026}, |
|
|
howpublished = {Hugging Face Hub model: ilya-kolchinsky/PromptComplexityEstimator} |
|
|
} |
|
|
``` |
|
|
|