Text Classification
Transformers
Safetensors
modernbert
multi-label
capability-classifier
routing
text-embeddings-inference
Instructions to use massaindustries/modernbert-capability-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use massaindustries/modernbert-capability-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="massaindustries/modernbert-capability-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("massaindustries/modernbert-capability-classifier") model = AutoModelForSequenceClassification.from_pretrained("massaindustries/modernbert-capability-classifier") - Notebooks
- Google Colab
- Kaggle
| base_model: answerdotai/ModernBERT-base | |
| library_name: transformers | |
| license: apache-2.0 | |
| datasets: | |
| - massaindustries/dataset-B-modernbert-train | |
| tags: | |
| - text-classification | |
| - multi-label | |
| - modernbert | |
| - capability-classifier | |
| - routing | |
| # ModernBERT capability classifier (6 dimensions) | |
| Fine-tuned on [`massaindustries/dataset-B-modernbert-train`](https://huggingface.co/datasets/massaindustries/dataset-B-modernbert-train). | |
| Outputs sigmoid scores in [0,1] over 6 capability dimensions: | |
| 1. `instruction_following` | |
| 2. `coding` | |
| 3. `math_reasoning` | |
| 4. `world_knowledge` | |
| 5. `planning_agentic` | |
| 6. `creative_synthesis` | |
| Designed for downstream routing in the Brick semantic router as a drop-in replacement for the domain classifier. | |
| ## Training | |
| - Architecture: ModernBERT + Linear(hidden→6) + sigmoid | |
| - Loss: BCEWithLogitsLoss on soft float labels (judge mean) | |
| - Precision: bf16 + FlashAttention-2 | |
| - HF problem_type: `multi_label_classification` | |
| ## Inference example | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| import torch | |
| m = AutoModelForSequenceClassification.from_pretrained('massaindustries/modernbert-capability-classifier') | |
| t = AutoTokenizer.from_pretrained('massaindustries/modernbert-capability-classifier') | |
| inp = t('write a python sort function', return_tensors='pt') | |
| scores = torch.sigmoid(m(**inp).logits)[0] | |
| for i, d in enumerate(m.config.id2label.values()): | |
| print(f'{d}: {scores[i].item():.3f}') | |
| ``` | |
| ## Evaluation (human_eval split, 200 Claude-annotated) | |
| ```json | |
| { | |
| "eval_loss": 0.42123839259147644, | |
| "eval_model_preparation_time": 0.0022, | |
| "eval_mae_instruction_following": 0.24792593717575073, | |
| "eval_rmse_instruction_following": 0.30881765484809875, | |
| "eval_brier_instruction_following": 0.09536834806203842, | |
| "eval_pearson_instruction_following": 0.8270609378814697, | |
| "eval_spearman_instruction_following": 0.8144904545331433, | |
| "eval_mae_coding": 0.07370934635400772, | |
| "eval_rmse_coding": 0.18934082984924316, | |
| "eval_brier_coding": 0.03584995120763779, | |
| "eval_pearson_coding": 0.9140766263008118, | |
| "eval_spearman_coding": 0.8615511297152596, | |
| "eval_mae_math_reasoning": 0.10867060720920563, | |
| "eval_rmse_math_reasoning": 0.1694405972957611, | |
| "eval_brier_math_reasoning": 0.02871011756360531, | |
| "eval_pearson_math_reasoning": 0.9191069602966309, | |
| "eval_spearman_math_reasoning": 0.8252107128077218, | |
| "eval_mae_world_knowledge": 0.13477517664432526, | |
| "eval_rmse_world_knowledge": 0.1875971555709839, | |
| "eval_brier_world_knowledge": 0.03519269451498985, | |
| "eval_pearson_world_knowledge": 0.8357715606689453, | |
| "eval_spearman_world_knowledge": 0.8138721105892404, | |
| "eval_mae_planning_agentic": 0.19774200022220612, | |
| "eval_rmse_planning_agentic": 0.2537391781806946, | |
| "eval_brier_planning_agentic": 0.06438356637954712, | |
| "eval_pearson_planning_agentic": 0.8233083486557007, | |
| "eval_spearman_planning_agentic": 0.7674644757779185, | |
| "eval_mae_creative_synthesis": 0.08937528729438782, | |
| "eval_rmse_creative_synthesis": 0.16472801566123962, | |
| "eval_brier_creative_synthesis": 0.027135320007801056, | |
| "eval_pearson_creative_synthesis": 0.9154033660888672, | |
| "eval_spearman_creative_synthesis": 0.8138763391203128, | |
| "eval_pearson_macro": 0.8724546333154043, | |
| "eval_mae_macro": 0.14203305914998055, | |
| "eval_spearman_macro": 0.8160775370905994, | |
| "eval_f1_macro_t3": 0.8775192561604114, | |
| "eval_f1_macro_t5": 0.8368971405647821, | |
| "eval_f1_macro_t7": 0.8287502804667367, | |
| "eval_runtime": 1.384, | |
| "eval_samples_per_second": 144.51, | |
| "eval_steps_p | |
| ``` |