rubert_level2_v2 / README.md
eternalGenius's picture
Update README.md
2121e28 verified
---
library_name: transformers
base_model: DeepPavlov/rubert-base-cased
language:
- ru
tags:
- text-classification
- bert
- safetensors
- multilabel-classification
- requirements-engineering
- generated_from_trainer
model-index:
- name: rubert_level2_v2
results:
- task:
type: text-classification
metrics:
- type: f1
value: 0.9110
name: F1 Micro
- type: f1
value: 0.9110
name: F1 Macro
- type: f1
value: 0.9120
name: F1 Weighted
---
# rubert_level2_v2
This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of non-functional software requirements in Russian (Level 2).
It achieves the following results on the evaluation set:
* F1 Micro: 0.9110
* F1 Macro: 0.9110
* F1 Weighted: 0.9120
## Model description
Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as `IsNonFunctional` by Level 1. Classifies into 11 non-functional requirement subcategories:
| Label | Description |
|---|---|
| `Availability (A)` | Uptime, SLA, availability percentage |
| `Fault Tolerance (FT)` | Failover, recovery, redundancy |
| `Legal (L)` | Regulatory compliance, standards, licenses |
| `Look & Feel (LF)` | Visual style, UI design |
| `Maintainability (MN)` | Code quality, documentation, tech debt |
| `Operability (O)` | Monitoring, administration, observability |
| `Performance (PE)` | Response time, throughput, latency |
| `Portability (PO)` | Platform and OS compatibility |
| `Scalability (SC)` | Load scaling, growth capacity |
| `Security (SE)` | Authentication, authorization, encryption |
| `Usability (US)` | UX, ease of use, learnability |
The model is part of a cascaded pipeline:
`Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report`
Per-class thresholds are stored in `thresholds.json` in the `eternalGenius/rubert_level1_v2` repository.
## Intended uses & limitations
Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as `IsNonFunctional` by Level 1.
## Training and evaluation data
Same dataset as Level 1, filtered to `IsNonFunctional=1` rows only.
Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each).
## Training procedure
### Training hyperparameters
* learning_rate: 5e-06
* train_batch_size: 16
* eval_batch_size: 16
* seed: 42
* optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
* lr_scheduler_type: linear
* lr_scheduler_warmup_ratio: 0.06
* num_epochs: 15 (early stopping patience=3)
* max_length: 96
### Per-class results (test set)
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Availability (A) | 1.000 | 0.939 | 0.968 | 98 |
| Fault Tolerance (FT) | 0.981 | 0.920 | 0.949 | 112 |
| Legal (L) | 0.860 | 0.925 | 0.891 | 106 |
| Look & Feel (LF) | 0.957 | 0.918 | 0.938 | 98 |
| Maintainability (MN) | 0.816 | 0.853 | 0.834 | 109 |
| Operability (O) | 0.976 | 0.883 | 0.927 | 94 |
| Performance (PE) | 0.883 | 0.958 | 0.919 | 118 |
| Portability (PO) | 0.911 | 0.944 | 0.927 | 108 |
| Scalability (SC) | 0.971 | 0.952 | 0.962 | 105 |
| Security (SE) | 0.858 | 0.875 | 0.867 | 104 |
| Usability (US) | 0.831 | 0.841 | 0.836 | 82 |
| **micro avg** | **0.910** | **0.912** | **0.911** | 1134 |
### Framework versions
* Transformers 4.57.1
* PyTorch 2.8.0+cu128
* Datasets 4.0.0
* Tokenizers 0.22.2