File size: 3,644 Bytes
d0699b2
 
2121e28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0699b2
 
2121e28
d0699b2
2121e28
d0699b2
2121e28
 
 
 
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
 
 
 
 
 
 
 
 
 
 
 
 
d0699b2
2121e28
 
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
d0699b2
2121e28
 
 
 
 
 
 
 
 
d0699b2
2121e28
d0699b2
2121e28
 
 
 
 
 
 
 
 
 
 
 
 
 
d0699b2
2121e28
d0699b2
2121e28
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
library_name: transformers
base_model: DeepPavlov/rubert-base-cased
language:
  - ru
tags:
  - text-classification
  - bert
  - safetensors
  - multilabel-classification
  - requirements-engineering
  - generated_from_trainer
model-index:
  - name: rubert_level2_v2
    results:
      - task:
          type: text-classification
        metrics:
          - type: f1
            value: 0.9110
            name: F1 Micro
          - type: f1
            value: 0.9110
            name: F1 Macro
          - type: f1
            value: 0.9120
            name: F1 Weighted
---

# rubert_level2_v2

This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of non-functional software requirements in Russian (Level 2).

It achieves the following results on the evaluation set:
* F1 Micro: 0.9110
* F1 Macro: 0.9110
* F1 Weighted: 0.9120

## Model description

Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as `IsNonFunctional` by Level 1. Classifies into 11 non-functional requirement subcategories:

| Label | Description |
|---|---|
| `Availability (A)` | Uptime, SLA, availability percentage |
| `Fault Tolerance (FT)` | Failover, recovery, redundancy |
| `Legal (L)` | Regulatory compliance, standards, licenses |
| `Look & Feel (LF)` | Visual style, UI design |
| `Maintainability (MN)` | Code quality, documentation, tech debt |
| `Operability (O)` | Monitoring, administration, observability |
| `Performance (PE)` | Response time, throughput, latency |
| `Portability (PO)` | Platform and OS compatibility |
| `Scalability (SC)` | Load scaling, growth capacity |
| `Security (SE)` | Authentication, authorization, encryption |
| `Usability (US)` | UX, ease of use, learnability |

The model is part of a cascaded pipeline:
`Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report`

Per-class thresholds are stored in `thresholds.json` in the `eternalGenius/rubert_level1_v2` repository.

## Intended uses & limitations

Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as `IsNonFunctional` by Level 1.

## Training and evaluation data

Same dataset as Level 1, filtered to `IsNonFunctional=1` rows only.

Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each).

## Training procedure

### Training hyperparameters

* learning_rate: 5e-06
* train_batch_size: 16
* eval_batch_size: 16
* seed: 42
* optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
* lr_scheduler_type: linear
* lr_scheduler_warmup_ratio: 0.06
* num_epochs: 15 (early stopping patience=3)
* max_length: 96

### Per-class results (test set)

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Availability (A) | 1.000 | 0.939 | 0.968 | 98 |
| Fault Tolerance (FT) | 0.981 | 0.920 | 0.949 | 112 |
| Legal (L) | 0.860 | 0.925 | 0.891 | 106 |
| Look & Feel (LF) | 0.957 | 0.918 | 0.938 | 98 |
| Maintainability (MN) | 0.816 | 0.853 | 0.834 | 109 |
| Operability (O) | 0.976 | 0.883 | 0.927 | 94 |
| Performance (PE) | 0.883 | 0.958 | 0.919 | 118 |
| Portability (PO) | 0.911 | 0.944 | 0.927 | 108 |
| Scalability (SC) | 0.971 | 0.952 | 0.962 | 105 |
| Security (SE) | 0.858 | 0.875 | 0.867 | 104 |
| Usability (US) | 0.831 | 0.841 | 0.836 | 82 |
| **micro avg** | **0.910** | **0.912** | **0.911** | 1134 |

### Framework versions

* Transformers 4.57.1
* PyTorch 2.8.0+cu128
* Datasets 4.0.0
* Tokenizers 0.22.2