Update README.md

9800a1c verified 11 days ago

4.09 kB

	---
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- regression
	- prompt
	- complexity-estimation
	- semantic-routing
	- llm-routing
	base_model: microsoft/deberta-v3-base
	license: apache-2.0
	---

	# PromptComplexityEstimator

	A lightweight regressor that estimates the complexity of an LLM prompt on a scale between 0 and 1.

	- Input: a string prompt
	- Output: a scalar score in [0, 1] (higher = more complex)

	The model is designed primarily to be used as a core building block for semantic routing systems, especially LLM vs. SLM (Small Language Model) routers.
	Any router that aims to intelligently decide which model should handle a request needs a reliable signal for how complex the request is. This is the gap this model aims to close.

	---

	## Intended use

	### Primary use case: LLM vs. SLM routing

	This model is intended to be used as part of a semantic router, where:
	- Simple prompts are handled by a small / fast / cheap model
	- Complex prompts are routed to a large / capable / expensive model

	The complexity score provides a learned signal for this decision.

	### Additional use cases
	- Prompt analytics and monitoring
	- Dataset stratification by difficulty
	- Adaptive compute allocation
	- Cost-aware or latency-aware inference pipelines

	### Not intended for
	- Safety classification, toxicity detection, or policy enforcement
	- Guaranteed difficulty estimation for a specific target model
	- Multimodal inputs or tool-augmented workflows (RAG/tools)

	---

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel

	repo_id = "ilya-kolchinsky/PromptComplexityEstimator"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
	model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()

	prompt = "Design a distributed consensus protocol with Byzantine fault tolerance..."
	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)

	with torch.no_grad():
	score = model(**inputs).logits.squeeze(-1).item()

	print(float(score))
	```


	### Example: Simple LLM vs. SLM routing

	```python
	THRESHOLD = 0.45 # chosen empirically

	def route_prompt(prompt: str) -> str:
	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
	with torch.no_grad():
	complexity = model(**inputs).logits.squeeze(-1).item()

	return "LLM" if complexity > THRESHOLD else "SLM"
	```

	---


	## Model and Training Details

	### Datasets
	- Cross-Difficulty (https://huggingface.co/datasets/BatsResearch/Cross-Difficulty)
	- [Easy2Hard-Bench](https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench)
	- [MATH](https://huggingface.co/datasets/EleutherAI/hendrycks_math)
	- [ARC](https://huggingface.co/datasets/allenai/ai2_arc)
	- [RACE](https://huggingface.co/datasets/ehovy/race)
	- [ANLI (R1/R2/R3)](https://huggingface.co/datasets/facebook/anli)

	### Training Configuration
	- Epochs: 3
	- Batch Size: 16
	- Loss: huber
	- Regressor Learning Rate: 7.5e-5
	- Encoder Learning Rate: 1.0e-5
	- Encoder Weight Decay: 0.01
	- Optimizer: AdamW
	- Schedule: Cosine (warmup_ratio=0.06)
	- Dropout: 0.1

	### Model
	- Backbone encoder: microsoft/deberta-v3-base
	- Mask-aware mean pooling over token embeddings + LayerNorm
	- Regression head: Linear → ReLU → Linear → Sigmoid
	- Max input length: 512 tokens
	- The model outputs a bounded score in [0, 1]. In the examples below, the score is read from `outputs.logits` (shape `[batch, 1]`).


	Full training code and configuration are available at https://github.com/ilya-kolchinsky/ComplexityEstimator.

	---

	## Performance

	On the held-out evaluation set used during development, the released checkpoint achieved:

	- MAE: 0.0855
	- Spearman correlation: 0.735

	---

	## Citation

	```bibtex
	@misc{kolchinsky_promptcomplexityestimator_2026,
	title = {PromptComplexityEstimator},
	author = {Ilya Kolchinsky},
	year = {2026},
	howpublished = {Hugging Face Hub model: ilya-kolchinsky/PromptComplexityEstimator}
	}
	```