| --- |
| library_name: transformers |
| base_model: DeepPavlov/rubert-base-cased |
| language: |
| - ru |
| tags: |
| - text-classification |
| - bert |
| - safetensors |
| - multilabel-classification |
| - requirements-engineering |
| - generated_from_trainer |
| model-index: |
| - name: rubert_level2_v2 |
| results: |
| - task: |
| type: text-classification |
| metrics: |
| - type: f1 |
| value: 0.9110 |
| name: F1 Micro |
| - type: f1 |
| value: 0.9110 |
| name: F1 Macro |
| - type: f1 |
| value: 0.9120 |
| name: F1 Weighted |
| --- |
| |
| # rubert_level2_v2 |
|
|
| This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of non-functional software requirements in Russian (Level 2). |
|
|
| It achieves the following results on the evaluation set: |
| * F1 Micro: 0.9110 |
| * F1 Macro: 0.9110 |
| * F1 Weighted: 0.9120 |
|
|
| ## Model description |
|
|
| Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as `IsNonFunctional` by Level 1. Classifies into 11 non-functional requirement subcategories: |
|
|
| | Label | Description | |
| |---|---| |
| | `Availability (A)` | Uptime, SLA, availability percentage | |
| | `Fault Tolerance (FT)` | Failover, recovery, redundancy | |
| | `Legal (L)` | Regulatory compliance, standards, licenses | |
| | `Look & Feel (LF)` | Visual style, UI design | |
| | `Maintainability (MN)` | Code quality, documentation, tech debt | |
| | `Operability (O)` | Monitoring, administration, observability | |
| | `Performance (PE)` | Response time, throughput, latency | |
| | `Portability (PO)` | Platform and OS compatibility | |
| | `Scalability (SC)` | Load scaling, growth capacity | |
| | `Security (SE)` | Authentication, authorization, encryption | |
| | `Usability (US)` | UX, ease of use, learnability | |
|
|
| The model is part of a cascaded pipeline: |
| `Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report` |
|
|
| Per-class thresholds are stored in `thresholds.json` in the `eternalGenius/rubert_level1_v2` repository. |
|
|
| ## Intended uses & limitations |
|
|
| Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as `IsNonFunctional` by Level 1. |
|
|
| ## Training and evaluation data |
|
|
| Same dataset as Level 1, filtered to `IsNonFunctional=1` rows only. |
|
|
| Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each). |
|
|
| ## Training procedure |
|
|
| ### Training hyperparameters |
|
|
| * learning_rate: 5e-06 |
| * train_batch_size: 16 |
| * eval_batch_size: 16 |
| * seed: 42 |
| * optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08 |
| * lr_scheduler_type: linear |
| * lr_scheduler_warmup_ratio: 0.06 |
| * num_epochs: 15 (early stopping patience=3) |
| * max_length: 96 |
|
|
| ### Per-class results (test set) |
|
|
| | Class | Precision | Recall | F1 | Support | |
| |---|---|---|---|---| |
| | Availability (A) | 1.000 | 0.939 | 0.968 | 98 | |
| | Fault Tolerance (FT) | 0.981 | 0.920 | 0.949 | 112 | |
| | Legal (L) | 0.860 | 0.925 | 0.891 | 106 | |
| | Look & Feel (LF) | 0.957 | 0.918 | 0.938 | 98 | |
| | Maintainability (MN) | 0.816 | 0.853 | 0.834 | 109 | |
| | Operability (O) | 0.976 | 0.883 | 0.927 | 94 | |
| | Performance (PE) | 0.883 | 0.958 | 0.919 | 118 | |
| | Portability (PO) | 0.911 | 0.944 | 0.927 | 108 | |
| | Scalability (SC) | 0.971 | 0.952 | 0.962 | 105 | |
| | Security (SE) | 0.858 | 0.875 | 0.867 | 104 | |
| | Usability (US) | 0.831 | 0.841 | 0.836 | 82 | |
| | **micro avg** | **0.910** | **0.912** | **0.911** | 1134 | |
|
|
| ### Framework versions |
|
|
| * Transformers 4.57.1 |
| * PyTorch 2.8.0+cu128 |
| * Datasets 4.0.0 |
| * Tokenizers 0.22.2 |