Text Classification
Transformers
Safetensors
modernbert
multi-label
capability-classifier
routing
text-embeddings-inference
Instructions to use massaindustries/modernbert-capability-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use massaindustries/modernbert-capability-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="massaindustries/modernbert-capability-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("massaindustries/modernbert-capability-classifier") model = AutoModelForSequenceClassification.from_pretrained("massaindustries/modernbert-capability-classifier") - Notebooks
- Google Colab
- Kaggle
upload trained ModernBERT capability classifier
Browse files- README.md +91 -0
- config.json +101 -0
- model.safetensors +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +17 -0
README.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: answerdotai/ModernBERT-base
|
| 3 |
+
library_name: transformers
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
datasets:
|
| 6 |
+
- massaindustries/dataset-B-modernbert-train
|
| 7 |
+
tags:
|
| 8 |
+
- text-classification
|
| 9 |
+
- multi-label
|
| 10 |
+
- modernbert
|
| 11 |
+
- capability-classifier
|
| 12 |
+
- routing
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# ModernBERT capability classifier (6 dimensions)
|
| 16 |
+
|
| 17 |
+
Fine-tuned on [`massaindustries/dataset-B-modernbert-train`](https://huggingface.co/datasets/massaindustries/dataset-B-modernbert-train).
|
| 18 |
+
Outputs sigmoid scores in [0,1] over 6 capability dimensions:
|
| 19 |
+
|
| 20 |
+
1. `instruction_following`
|
| 21 |
+
2. `coding`
|
| 22 |
+
3. `math_reasoning`
|
| 23 |
+
4. `world_knowledge`
|
| 24 |
+
5. `planning_agentic`
|
| 25 |
+
6. `creative_synthesis`
|
| 26 |
+
|
| 27 |
+
Designed for downstream routing in the Brick semantic router as a drop-in replacement for the domain classifier.
|
| 28 |
+
|
| 29 |
+
## Training
|
| 30 |
+
- Architecture: ModernBERT + Linear(hidden→6) + sigmoid
|
| 31 |
+
- Loss: BCEWithLogitsLoss on soft float labels (judge mean)
|
| 32 |
+
- Precision: bf16 + FlashAttention-2
|
| 33 |
+
- HF problem_type: `multi_label_classification`
|
| 34 |
+
|
| 35 |
+
## Inference example
|
| 36 |
+
```python
|
| 37 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 38 |
+
import torch
|
| 39 |
+
m = AutoModelForSequenceClassification.from_pretrained('massaindustries/modernbert-capability-classifier')
|
| 40 |
+
t = AutoTokenizer.from_pretrained('massaindustries/modernbert-capability-classifier')
|
| 41 |
+
inp = t('write a python sort function', return_tensors='pt')
|
| 42 |
+
scores = torch.sigmoid(m(**inp).logits)[0]
|
| 43 |
+
for i, d in enumerate(m.config.id2label.values()):
|
| 44 |
+
print(f'{d}: {scores[i].item():.3f}')
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Evaluation (human_eval split, 200 Claude-annotated)
|
| 48 |
+
```json
|
| 49 |
+
{
|
| 50 |
+
"eval_loss": 0.42123839259147644,
|
| 51 |
+
"eval_model_preparation_time": 0.0022,
|
| 52 |
+
"eval_mae_instruction_following": 0.24792593717575073,
|
| 53 |
+
"eval_rmse_instruction_following": 0.30881765484809875,
|
| 54 |
+
"eval_brier_instruction_following": 0.09536834806203842,
|
| 55 |
+
"eval_pearson_instruction_following": 0.8270609378814697,
|
| 56 |
+
"eval_spearman_instruction_following": 0.8144904545331433,
|
| 57 |
+
"eval_mae_coding": 0.07370934635400772,
|
| 58 |
+
"eval_rmse_coding": 0.18934082984924316,
|
| 59 |
+
"eval_brier_coding": 0.03584995120763779,
|
| 60 |
+
"eval_pearson_coding": 0.9140766263008118,
|
| 61 |
+
"eval_spearman_coding": 0.8615511297152596,
|
| 62 |
+
"eval_mae_math_reasoning": 0.10867060720920563,
|
| 63 |
+
"eval_rmse_math_reasoning": 0.1694405972957611,
|
| 64 |
+
"eval_brier_math_reasoning": 0.02871011756360531,
|
| 65 |
+
"eval_pearson_math_reasoning": 0.9191069602966309,
|
| 66 |
+
"eval_spearman_math_reasoning": 0.8252107128077218,
|
| 67 |
+
"eval_mae_world_knowledge": 0.13477517664432526,
|
| 68 |
+
"eval_rmse_world_knowledge": 0.1875971555709839,
|
| 69 |
+
"eval_brier_world_knowledge": 0.03519269451498985,
|
| 70 |
+
"eval_pearson_world_knowledge": 0.8357715606689453,
|
| 71 |
+
"eval_spearman_world_knowledge": 0.8138721105892404,
|
| 72 |
+
"eval_mae_planning_agentic": 0.19774200022220612,
|
| 73 |
+
"eval_rmse_planning_agentic": 0.2537391781806946,
|
| 74 |
+
"eval_brier_planning_agentic": 0.06438356637954712,
|
| 75 |
+
"eval_pearson_planning_agentic": 0.8233083486557007,
|
| 76 |
+
"eval_spearman_planning_agentic": 0.7674644757779185,
|
| 77 |
+
"eval_mae_creative_synthesis": 0.08937528729438782,
|
| 78 |
+
"eval_rmse_creative_synthesis": 0.16472801566123962,
|
| 79 |
+
"eval_brier_creative_synthesis": 0.027135320007801056,
|
| 80 |
+
"eval_pearson_creative_synthesis": 0.9154033660888672,
|
| 81 |
+
"eval_spearman_creative_synthesis": 0.8138763391203128,
|
| 82 |
+
"eval_pearson_macro": 0.8724546333154043,
|
| 83 |
+
"eval_mae_macro": 0.14203305914998055,
|
| 84 |
+
"eval_spearman_macro": 0.8160775370905994,
|
| 85 |
+
"eval_f1_macro_t3": 0.8775192561604114,
|
| 86 |
+
"eval_f1_macro_t5": 0.8368971405647821,
|
| 87 |
+
"eval_f1_macro_t7": 0.8287502804667367,
|
| 88 |
+
"eval_runtime": 1.384,
|
| 89 |
+
"eval_samples_per_second": 144.51,
|
| 90 |
+
"eval_steps_p
|
| 91 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"ModernBertForSequenceClassification"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": null,
|
| 8 |
+
"classifier_activation": "gelu",
|
| 9 |
+
"classifier_bias": false,
|
| 10 |
+
"classifier_dropout": 0.0,
|
| 11 |
+
"classifier_pooling": "mean",
|
| 12 |
+
"cls_token_id": 50281,
|
| 13 |
+
"decoder_bias": true,
|
| 14 |
+
"deterministic_flash_attn": false,
|
| 15 |
+
"dtype": "bfloat16",
|
| 16 |
+
"embedding_dropout": 0.0,
|
| 17 |
+
"eos_token_id": null,
|
| 18 |
+
"global_attn_every_n_layers": 3,
|
| 19 |
+
"gradient_checkpointing": false,
|
| 20 |
+
"hidden_activation": "gelu",
|
| 21 |
+
"hidden_size": 1024,
|
| 22 |
+
"id2label": {
|
| 23 |
+
"0": "instruction_following",
|
| 24 |
+
"1": "coding",
|
| 25 |
+
"2": "math_reasoning",
|
| 26 |
+
"3": "world_knowledge",
|
| 27 |
+
"4": "planning_agentic",
|
| 28 |
+
"5": "creative_synthesis"
|
| 29 |
+
},
|
| 30 |
+
"initializer_cutoff_factor": 2.0,
|
| 31 |
+
"initializer_range": 0.02,
|
| 32 |
+
"intermediate_size": 2624,
|
| 33 |
+
"label2id": {
|
| 34 |
+
"instruction_following": 0,
|
| 35 |
+
"coding": 1,
|
| 36 |
+
"math_reasoning": 2,
|
| 37 |
+
"world_knowledge": 3,
|
| 38 |
+
"planning_agentic": 4,
|
| 39 |
+
"creative_synthesis": 5
|
| 40 |
+
},
|
| 41 |
+
"layer_norm_eps": 1e-05,
|
| 42 |
+
"layer_types": [
|
| 43 |
+
"full_attention",
|
| 44 |
+
"sliding_attention",
|
| 45 |
+
"sliding_attention",
|
| 46 |
+
"full_attention",
|
| 47 |
+
"sliding_attention",
|
| 48 |
+
"sliding_attention",
|
| 49 |
+
"full_attention",
|
| 50 |
+
"sliding_attention",
|
| 51 |
+
"sliding_attention",
|
| 52 |
+
"full_attention",
|
| 53 |
+
"sliding_attention",
|
| 54 |
+
"sliding_attention",
|
| 55 |
+
"full_attention",
|
| 56 |
+
"sliding_attention",
|
| 57 |
+
"sliding_attention",
|
| 58 |
+
"full_attention",
|
| 59 |
+
"sliding_attention",
|
| 60 |
+
"sliding_attention",
|
| 61 |
+
"full_attention",
|
| 62 |
+
"sliding_attention",
|
| 63 |
+
"sliding_attention",
|
| 64 |
+
"full_attention",
|
| 65 |
+
"sliding_attention",
|
| 66 |
+
"sliding_attention",
|
| 67 |
+
"full_attention",
|
| 68 |
+
"sliding_attention",
|
| 69 |
+
"sliding_attention",
|
| 70 |
+
"full_attention"
|
| 71 |
+
],
|
| 72 |
+
"local_attention": 128,
|
| 73 |
+
"max_position_embeddings": 8192,
|
| 74 |
+
"mlp_bias": false,
|
| 75 |
+
"mlp_dropout": 0.0,
|
| 76 |
+
"model_type": "modernbert",
|
| 77 |
+
"norm_bias": false,
|
| 78 |
+
"norm_eps": 1e-05,
|
| 79 |
+
"num_attention_heads": 16,
|
| 80 |
+
"num_hidden_layers": 28,
|
| 81 |
+
"pad_token_id": 50283,
|
| 82 |
+
"position_embedding_type": "absolute",
|
| 83 |
+
"problem_type": "multi_label_classification",
|
| 84 |
+
"rope_parameters": {
|
| 85 |
+
"full_attention": {
|
| 86 |
+
"rope_theta": 160000.0,
|
| 87 |
+
"rope_type": "default"
|
| 88 |
+
},
|
| 89 |
+
"sliding_attention": {
|
| 90 |
+
"rope_theta": 10000.0,
|
| 91 |
+
"rope_type": "default"
|
| 92 |
+
}
|
| 93 |
+
},
|
| 94 |
+
"sep_token_id": 50282,
|
| 95 |
+
"sparse_pred_ignore_index": -100,
|
| 96 |
+
"sparse_prediction": false,
|
| 97 |
+
"tie_word_embeddings": true,
|
| 98 |
+
"transformers_version": "5.7.0",
|
| 99 |
+
"vocab_size": 50368,
|
| 100 |
+
"num_labels": 6
|
| 101 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d94439d1ee1ad4370c1dc3ba905b5ef71720e1ab85d5e9ee335db3ffa44cced
|
| 3 |
+
size 791693180
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"clean_up_tokenization_spaces": true,
|
| 4 |
+
"cls_token": "[CLS]",
|
| 5 |
+
"is_local": true,
|
| 6 |
+
"local_files_only": false,
|
| 7 |
+
"mask_token": "[MASK]",
|
| 8 |
+
"model_input_names": [
|
| 9 |
+
"input_ids",
|
| 10 |
+
"attention_mask"
|
| 11 |
+
],
|
| 12 |
+
"model_max_length": 8192,
|
| 13 |
+
"pad_token": "[PAD]",
|
| 14 |
+
"sep_token": "[SEP]",
|
| 15 |
+
"tokenizer_class": "TokenizersBackend",
|
| 16 |
+
"unk_token": "[UNK]"
|
| 17 |
+
}
|