File size: 4,093 Bytes
15b89fd 9800a1c 15b89fd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
library_name: transformers
pipeline_tag: text-classification
tags:
- regression
- prompt
- complexity-estimation
- semantic-routing
- llm-routing
base_model: microsoft/deberta-v3-base
license: apache-2.0
---
# PromptComplexityEstimator
A lightweight regressor that estimates the complexity of an LLM prompt on a scale between 0 and 1.
- **Input:** a string prompt
- **Output:** a scalar score in [0, 1] (higher = more complex)
The model is designed primarily to be used as a core building block for semantic routing systems, especially LLM vs. SLM (Small Language Model) routers.
Any router that aims to intelligently decide *which model should handle a request* needs a reliable signal for *how complex the request is*. This is the gap this model aims to close.
---
## Intended use
### Primary use case: LLM vs. SLM routing
This model is intended to be used as part of a semantic router, where:
- *Simple* prompts are handled by a **small / fast / cheap model**
- *Complex* prompts are routed to a **large / capable / expensive model**
The complexity score provides a learned signal for this decision.
### Additional use cases
- Prompt analytics and monitoring
- Dataset stratification by difficulty
- Adaptive compute allocation
- Cost-aware or latency-aware inference pipelines
### Not intended for
- Safety classification, toxicity detection, or policy enforcement
- Guaranteed difficulty estimation for a specific target model
- Multimodal inputs or tool-augmented workflows (RAG/tools)
---
## Usage
```python
import torch
from transformers import AutoTokenizer, AutoModel
repo_id = "ilya-kolchinsky/PromptComplexityEstimator"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
prompt = "Design a distributed consensus protocol with Byzantine fault tolerance..."
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
score = model(**inputs).logits.squeeze(-1).item()
print(float(score))
```
### Example: Simple LLM vs. SLM routing
```python
THRESHOLD = 0.45 # chosen empirically
def route_prompt(prompt: str) -> str:
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
complexity = model(**inputs).logits.squeeze(-1).item()
return "LLM" if complexity > THRESHOLD else "SLM"
```
---
## Model and Training Details
### Datasets
- Cross-Difficulty (https://huggingface.co/datasets/BatsResearch/Cross-Difficulty)
- [Easy2Hard-Bench](https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench)
- [MATH](https://huggingface.co/datasets/EleutherAI/hendrycks_math)
- [ARC](https://huggingface.co/datasets/allenai/ai2_arc)
- [RACE](https://huggingface.co/datasets/ehovy/race)
- [ANLI (R1/R2/R3)](https://huggingface.co/datasets/facebook/anli)
### Training Configuration
- **Epochs:** 3
- **Batch Size:** 16
- **Loss:** huber
- **Regressor Learning Rate:** 7.5e-5
- **Encoder Learning Rate:** 1.0e-5
- **Encoder Weight Decay:** 0.01
- **Optimizer**: AdamW
- **Schedule**: Cosine (warmup_ratio=0.06)
- **Dropout**: 0.1
### Model
- **Backbone encoder:** microsoft/deberta-v3-base
- Mask-aware **mean pooling** over token embeddings + **LayerNorm**
- **Regression head:** Linear → ReLU → Linear → Sigmoid
- **Max input length:** 512 tokens
- The model outputs a bounded score in [0, 1]. In the examples below, the score is read from `outputs.logits` (shape `[batch, 1]`).
Full training code and configuration are available at https://github.com/ilya-kolchinsky/ComplexityEstimator.
---
## Performance
On the held-out evaluation set used during development, the released checkpoint achieved:
- **MAE:** **0.0855**
- **Spearman correlation:** **0.735**
---
## Citation
```bibtex
@misc{kolchinsky_promptcomplexityestimator_2026,
title = {PromptComplexityEstimator},
author = {Ilya Kolchinsky},
year = {2026},
howpublished = {Hugging Face Hub model: ilya-kolchinsky/PromptComplexityEstimator}
}
```
|