--- base_model: answerdotai/ModernBERT-base library_name: transformers license: apache-2.0 datasets: - massaindustries/dataset-B-modernbert-train tags: - text-classification - multi-label - modernbert - capability-classifier - routing --- # ModernBERT capability classifier (6 dimensions) Fine-tuned on [`massaindustries/dataset-B-modernbert-train`](https://huggingface.co/datasets/massaindustries/dataset-B-modernbert-train). Outputs sigmoid scores in [0,1] over 6 capability dimensions: 1. `instruction_following` 2. `coding` 3. `math_reasoning` 4. `world_knowledge` 5. `planning_agentic` 6. `creative_synthesis` Designed for downstream routing in the Brick semantic router as a drop-in replacement for the domain classifier. ## Training - Architecture: ModernBERT + Linear(hidden→6) + sigmoid - Loss: BCEWithLogitsLoss on soft float labels (judge mean) - Precision: bf16 + FlashAttention-2 - HF problem_type: `multi_label_classification` ## Inference example ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch m = AutoModelForSequenceClassification.from_pretrained('massaindustries/modernbert-capability-classifier') t = AutoTokenizer.from_pretrained('massaindustries/modernbert-capability-classifier') inp = t('write a python sort function', return_tensors='pt') scores = torch.sigmoid(m(**inp).logits)[0] for i, d in enumerate(m.config.id2label.values()): print(f'{d}: {scores[i].item():.3f}') ``` ## Evaluation (human_eval split, 200 Claude-annotated) ```json { "eval_loss": 0.42123839259147644, "eval_model_preparation_time": 0.0022, "eval_mae_instruction_following": 0.24792593717575073, "eval_rmse_instruction_following": 0.30881765484809875, "eval_brier_instruction_following": 0.09536834806203842, "eval_pearson_instruction_following": 0.8270609378814697, "eval_spearman_instruction_following": 0.8144904545331433, "eval_mae_coding": 0.07370934635400772, "eval_rmse_coding": 0.18934082984924316, "eval_brier_coding": 0.03584995120763779, "eval_pearson_coding": 0.9140766263008118, "eval_spearman_coding": 0.8615511297152596, "eval_mae_math_reasoning": 0.10867060720920563, "eval_rmse_math_reasoning": 0.1694405972957611, "eval_brier_math_reasoning": 0.02871011756360531, "eval_pearson_math_reasoning": 0.9191069602966309, "eval_spearman_math_reasoning": 0.8252107128077218, "eval_mae_world_knowledge": 0.13477517664432526, "eval_rmse_world_knowledge": 0.1875971555709839, "eval_brier_world_knowledge": 0.03519269451498985, "eval_pearson_world_knowledge": 0.8357715606689453, "eval_spearman_world_knowledge": 0.8138721105892404, "eval_mae_planning_agentic": 0.19774200022220612, "eval_rmse_planning_agentic": 0.2537391781806946, "eval_brier_planning_agentic": 0.06438356637954712, "eval_pearson_planning_agentic": 0.8233083486557007, "eval_spearman_planning_agentic": 0.7674644757779185, "eval_mae_creative_synthesis": 0.08937528729438782, "eval_rmse_creative_synthesis": 0.16472801566123962, "eval_brier_creative_synthesis": 0.027135320007801056, "eval_pearson_creative_synthesis": 0.9154033660888672, "eval_spearman_creative_synthesis": 0.8138763391203128, "eval_pearson_macro": 0.8724546333154043, "eval_mae_macro": 0.14203305914998055, "eval_spearman_macro": 0.8160775370905994, "eval_f1_macro_t3": 0.8775192561604114, "eval_f1_macro_t5": 0.8368971405647821, "eval_f1_macro_t7": 0.8287502804667367, "eval_runtime": 1.384, "eval_samples_per_second": 144.51, "eval_steps_p ```