Update README.md
Browse files
README.md
CHANGED
|
@@ -7,161 +7,122 @@ base_model:
|
|
| 7 |
- answerdotai/ModernBERT-large
|
| 8 |
pipeline_tag: text-classification
|
| 9 |
author: Shreyan C (@thethinkmachine)
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# Maxwell Task Complexity Scorer-v0.2
|
| 13 |
|
| 14 |
-
|
| 15 |
-
## Model Details
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
- **Model type:** Bidirectional Encoder Representations from Transformers, based on the ModernBERT-Large architecture.
|
| 25 |
-
- **Language(s) (NLP):** English (en)
|
| 26 |
-
- **License:** Apache License, Version 2.0
|
| 27 |
-
- **Finetuned from model**: ModernBERT-Large
|
| 28 |
-
## Applications
|
| 29 |
-
|
| 30 |
-
- **Prompt Complexity Scoring:** Maxwell can be used to predict the complexity of a given instruction or prompt.
|
| 31 |
-
- **Dataset Annotation:** Maxwell can be used to annotate the complexity of instructions in a dataset.
|
| 32 |
-
- **Reward Model**: Maxwell can be used as a reward model for reinforcement learning tasks.
|
| 33 |
-
|
| 34 |
-
### Recommendations
|
| 35 |
-
|
| 36 |
-
#### To reproduce Evol-Complexity scores:
|
| 37 |
-
To reproduce the original Evol-Complexity scores, it is recommended to use the following transformation over the predicted scores,
|
| 38 |
-
$$\text{S}_{predicted} \times (6 - 1) + 1$$
|
| 39 |
-
This formula effectively converts normalized scores back to the continuous range [1,6]. Now all that remains is to ensure that the model's predictions correspond to the original discrete scoring scale, which can be achieved simply by rounding the transformed scores to the nearest integer.
|
| 40 |
-
#### To use a different scaling factor:
|
| 41 |
|
| 42 |
-
|
| 43 |
-
$$\text{S}_{predicted} \times (\text{max} - \text{min}) + \text{min}$$
|
| 44 |
-
**Note: Scaling factors are arbitrary and can be adjusted as needed. Ordering of the scores is preserved during min-max scaling, so the model should still be able to predict the relative complexity of instructions accurately regardless of the scaling factor used.**
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
```python
|
| 52 |
-
import torch
|
| 53 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
|
| 54 |
|
| 55 |
model_name = "thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2"
|
| 56 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 57 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
outputs = model(**inputs)
|
| 63 |
-
normalized_pred = outputs.logits.squeeze()
|
| 64 |
-
final_score = normalized_pred * (6 - 1) + 1
|
| 65 |
-
final_score = torch.clamp(final_score, min=1.0, max=6.0)
|
| 66 |
-
final_score = torch.round(final_score)
|
| 67 |
-
return final_score.item()
|
| 68 |
-
|
| 69 |
-
def get_scaled_complexity_score(question: str) -> float:
|
| 70 |
-
inputs = tokenizer(question, return_tensors="pt")
|
| 71 |
with torch.no_grad():
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
final_score = normalized_pred * (max_score - min_score) + min_score
|
| 75 |
-
final_score = torch.clamp(final_score, min=min_score, max=max_score)
|
| 76 |
-
final_score = final_score.item()
|
| 77 |
-
return round(final_score, 2)
|
| 78 |
|
| 79 |
-
|
| 80 |
-
max_score =
|
| 81 |
-
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
| 85 |
```
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
-
|
| 90 |
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
|
| 96 |
-
|
|
| 97 |
-
|
|
| 98 |
-
|
|
| 99 |
-
|
|
| 100 |
-
|
|
| 101 |
-
|
|
| 102 |
-
|
|
| 103 |
-
| 5 | 24485 | 37.4% |
|
| 104 |
-
| 6 | 6123 | 9.3% |
|
| 105 |
-
| 7 | 6 | < 0.1% |
|
| 106 |
-
| 8 | 5 | < 0.1% |
|
| 107 |
-
| 9 | 5 | < 0.1% |
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
|
| 119 |
-
|
| 120 |
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
- **# Training tokens:** 50.3 million tokens
|
| 125 |
-
- **Tokens-Per-Parameter Ratio:** ~3.5 (on 14.4 million trainable parameters)
|
| 126 |
-
- **Batch size:** 8
|
| 127 |
-
- **Max epochs:** 3
|
| 128 |
-
- **Learning rate:** 5e-5
|
| 129 |
-
- **Optimizer:** AdamW
|
| 130 |
-
- **Warmup steps ratio:** 0.1
|
| 131 |
-
- **Weight decay:** 0.01
|
| 132 |
-
- **Max sequence length:** 512
|
| 133 |
-
### LoRA Hyperparameters
|
| 134 |
|
| 135 |
-
|
| 136 |
-
- **LoRA Rank:** 32
|
| 137 |
-
- **LoRA Alpha:** 64
|
| 138 |
-
- **LoRA Dropout:** 0.1
|
| 139 |
-
- **LoRA Initialization:** PISSA
|
| 140 |
|
| 141 |
-
##
|
| 142 |
|
| 143 |
-
|
|
|
|
|
|
|
| 144 |
|
| 145 |
-
|
| 146 |
|
| 147 |
-
|
| 148 |
|
| 149 |
-
|
| 150 |
|
| 151 |
-
|
| 152 |
-
- **Hours used:** 16 hours
|
| 153 |
-
- **Cloud Provider:** Google Cloud Platform
|
| 154 |
-
- **Compute Region:** South Asia
|
| 155 |
-
- **Carbon Emitted:** 0.87 kgCO2eq (fully offset by provider)
|
| 156 |
-
## Author
|
| 157 |
|
| 158 |
-
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
-
|
|
|
|
| 163 |
|
| 164 |
-
|
| 165 |
-
- [[2404.02948] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models](https://arxiv.org/abs/2404.02948)
|
| 166 |
-
- [DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity)
|
| 167 |
-
- [ModernBERT-Large](https://huggingface.co/answerdotai/ModernBERT-large)
|
|
|
|
| 7 |
- answerdotai/ModernBERT-large
|
| 8 |
pipeline_tag: text-classification
|
| 9 |
author: Shreyan C (@thethinkmachine)
|
| 10 |
+
datasets:
|
| 11 |
+
- BhabhaAI/DEITA-Complexity
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
|
| 15 |
+

|
|
|
|
| 16 |
|
| 17 |
+
# Maxwell Instruction Complexity Estimator (MICE)
|
| 18 |
|
| 19 |
+
[](https://huggingface.co/thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2) [](LICENSE) [](#)
|
| 20 |
|
| 21 |
+
A fast, efficient, and accurate instruction complexity scorer powered by ModernBERT-Large. MICE predicts normalized task difficulty scores (0–1) for English instructions, with an easy option to rescale to custom ranges.
|
| 22 |
|
| 23 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
+
## 🚀 Features
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
* **Lightweight & Fast**: Leverages a compact backbone (ModernBERT-Large + LoRA) with only 14.4M trainable parameters.
|
| 28 |
+
* **Data-Driven**: Trained on 66.5K English instruction–score pairs from the DEITA-Complexity dataset.
|
| 29 |
+
* **High Fidelity**: Matches the performance of models 34× larger on standard complexity benchmarks.
|
| 30 |
+
* **Flexible Scoring**: Outputs normalized scores (0–1) by default, with optional denormalization to any range (e.g., \[1–6], \[0–100]).
|
| 31 |
|
| 32 |
+
---
|
| 33 |
|
| 34 |
+
## 🔧 Usage
|
| 35 |
|
| 36 |
```python
|
|
|
|
| 37 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 38 |
+
import torch
|
| 39 |
|
| 40 |
model_name = "thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2"
|
| 41 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 42 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 43 |
|
| 44 |
+
# 1. Get normalized complexity (0–1)
|
| 45 |
+
def get_normalized_score(text: str) -> float:
|
| 46 |
+
inputs = tokenizer(text, return_tensors="pt")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
with torch.no_grad():
|
| 48 |
+
logits = model(**inputs).logits.squeeze()
|
| 49 |
+
return float(logits)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
+
# 2. Denormalize to [min_score, max_score]
|
| 52 |
+
def get_denormalized_score(text: str, min_score: float = 1, max_score: float = 6) -> float:
|
| 53 |
+
norm = get_normalized_score(text)
|
| 54 |
+
raw = norm * (max_score - min_score) + min_score
|
| 55 |
+
return float(round(raw, 2))
|
| 56 |
|
| 57 |
+
# Example
|
| 58 |
+
query = "Is learning equivalent to decreasing local entropy?"
|
| 59 |
+
print("Normalized:", get_normalized_score(query))
|
| 60 |
+
print("Evol-Complexity [1–6]:", get_denormalized_score(query))
|
| 61 |
```
|
| 62 |
|
| 63 |
+
---
|
| 64 |
|
| 65 |
+
## 📖 Model Details
|
| 66 |
|
| 67 |
+
* **Architecture:** ModernBERT-Large backbone with LoRA adapters (rank 32, alpha 64, dropout 0.1).
|
| 68 |
+
* **Task:** Sequence Classification.
|
| 69 |
+
* **Languages:** English.
|
| 70 |
+
* **Training Data:** 66,500 instruction–score pairs from \[BhabhaAI/DEITA-Complexity].
|
| 71 |
+
* **Normalization:** Min–max scaled to \[0,1]; denormalization recommended via `score * (max - min) + min`.
|
| 72 |
|
| 73 |
+
### Data Distribution
|
| 74 |
|
| 75 |
+
| Original Score | Count | % |
|
| 76 |
+
| -------------- | ------ | ----- |
|
| 77 |
+
| 1 | 8,729 | 13.3% |
|
| 78 |
+
| 2 | 5,399 | 8.2% |
|
| 79 |
+
| 3 | 10,937 | 16.7% |
|
| 80 |
+
| 4 | 9,801 | 15.0% |
|
| 81 |
+
| 5 | 24,485 | 37.4% |
|
| 82 |
+
| 6 | 6,123 | 9.3% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
+
Outliers (0,7–9) were pruned (<1% of data).
|
| 85 |
|
| 86 |
+
---
|
| 87 |
|
| 88 |
+
## ⚙️ Training Configuration
|
| 89 |
|
| 90 |
+
* **Optimizer:** AdamW (lr=5e-5, weight decay=0.01)
|
| 91 |
+
* **Batch Size:** 8
|
| 92 |
+
* **Epochs:** 3
|
| 93 |
+
* **Max Seq. Length:** 512
|
| 94 |
+
* **Warmup:** 10% of total steps
|
| 95 |
+
* **Compute:** 50.3M tokens, TTP ratio ≈3.5
|
| 96 |
|
| 97 |
+
---
|
| 98 |
|
| 99 |
+
## 🌱 Environmental Impact
|
| 100 |
|
| 101 |
+
* **Compute Used:** 16h on 1× NVIDIA L4 GPU (72W TDP) in GCP asia-south1.
|
| 102 |
+
* **CO₂ Emissions:** 0.87 kg CO₂eq (fully offset).
|
| 103 |
+
* **Estimator:** ML CO₂ Impact Calculator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
## 🔍 Bias & Limitations
|
| 108 |
|
| 109 |
+
* **Domain Bias:** Trained primarily on general English; may underperform on technical/coding/math instructions.
|
| 110 |
+
* **Language:** English-only.
|
| 111 |
+
* **Scaling Caution:** Denormalization preserves ordering but absolute values depend on chosen range.
|
| 112 |
|
| 113 |
+
---
|
| 114 |
|
| 115 |
+
## 📚 Citation
|
| 116 |
|
| 117 |
+
If you use MICE in your research, please cite:
|
| 118 |
|
| 119 |
+
> Chaubey, S. (2024). Maxwell Instruction Complexity Estimator (MICE). https://huggingface.co/thethinkmachine/MICE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
+
---
|
| 122 |
|
| 123 |
+
## 🙋♂️ Author & Contact
|
| 124 |
|
| 125 |
+
**Shreyan C** ([thethinkmachine](https://huggingface.co/thethinkmachine))
|
| 126 |
+
Email: [shreyan.chaubey@gmail.com](mailto:shreyan.chaubey@gmail.com)
|
| 127 |
|
| 128 |
+
*This project is licensed under the Apache 2.0 License.*
|
|
|
|
|
|
|
|
|