--- library_name: transformers base_model: DeepPavlov/rubert-base-cased language: - ru tags: - text-classification - bert - safetensors - multilabel-classification - requirements-engineering - generated_from_trainer model-index: - name: rubert_level2_v2 results: - task: type: text-classification metrics: - type: f1 value: 0.9110 name: F1 Micro - type: f1 value: 0.9110 name: F1 Macro - type: f1 value: 0.9120 name: F1 Weighted --- # rubert_level2_v2 This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of non-functional software requirements in Russian (Level 2). It achieves the following results on the evaluation set: * F1 Micro: 0.9110 * F1 Macro: 0.9110 * F1 Weighted: 0.9120 ## Model description Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as `IsNonFunctional` by Level 1. Classifies into 11 non-functional requirement subcategories: | Label | Description | |---|---| | `Availability (A)` | Uptime, SLA, availability percentage | | `Fault Tolerance (FT)` | Failover, recovery, redundancy | | `Legal (L)` | Regulatory compliance, standards, licenses | | `Look & Feel (LF)` | Visual style, UI design | | `Maintainability (MN)` | Code quality, documentation, tech debt | | `Operability (O)` | Monitoring, administration, observability | | `Performance (PE)` | Response time, throughput, latency | | `Portability (PO)` | Platform and OS compatibility | | `Scalability (SC)` | Load scaling, growth capacity | | `Security (SE)` | Authentication, authorization, encryption | | `Usability (US)` | UX, ease of use, learnability | The model is part of a cascaded pipeline: `Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report` Per-class thresholds are stored in `thresholds.json` in the `eternalGenius/rubert_level1_v2` repository. ## Intended uses & limitations Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as `IsNonFunctional` by Level 1. ## Training and evaluation data Same dataset as Level 1, filtered to `IsNonFunctional=1` rows only. Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each). ## Training procedure ### Training hyperparameters * learning_rate: 5e-06 * train_batch_size: 16 * eval_batch_size: 16 * seed: 42 * optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08 * lr_scheduler_type: linear * lr_scheduler_warmup_ratio: 0.06 * num_epochs: 15 (early stopping patience=3) * max_length: 96 ### Per-class results (test set) | Class | Precision | Recall | F1 | Support | |---|---|---|---|---| | Availability (A) | 1.000 | 0.939 | 0.968 | 98 | | Fault Tolerance (FT) | 0.981 | 0.920 | 0.949 | 112 | | Legal (L) | 0.860 | 0.925 | 0.891 | 106 | | Look & Feel (LF) | 0.957 | 0.918 | 0.938 | 98 | | Maintainability (MN) | 0.816 | 0.853 | 0.834 | 109 | | Operability (O) | 0.976 | 0.883 | 0.927 | 94 | | Performance (PE) | 0.883 | 0.958 | 0.919 | 118 | | Portability (PO) | 0.911 | 0.944 | 0.927 | 108 | | Scalability (SC) | 0.971 | 0.952 | 0.962 | 105 | | Security (SE) | 0.858 | 0.875 | 0.867 | 104 | | Usability (US) | 0.831 | 0.841 | 0.836 | 82 | | **micro avg** | **0.910** | **0.912** | **0.911** | 1134 | ### Framework versions * Transformers 4.57.1 * PyTorch 2.8.0+cu128 * Datasets 4.0.0 * Tokenizers 0.22.2