File size: 3,595 Bytes
421557c
 
ccafb31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
421557c
 
ccafb31
421557c
ccafb31
421557c
ccafb31
 
 
 
 
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
 
 
 
 
421557c
ccafb31
421557c
ccafb31
 
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
 
 
 
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
 
 
 
 
 
 
 
 
421557c
ccafb31
421557c
ccafb31
 
 
 
 
 
 
 
421557c
ccafb31
421557c
ccafb31
421557c
ccafb31
 
 
 
 
 
421557c
ccafb31
421557c
ccafb31
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
library_name: transformers
base_model: DeepPavlov/rubert-base-cased
language:
  - ru
tags:
  - text-classification
  - bert
  - safetensors
  - multilabel-classification
  - requirements-engineering
  - generated_from_trainer
model-index:
  - name: rubert_level1_v2
    results:
      - task:
          type: text-classification
        metrics:
          - type: loss
            value: 0.0727
            name: Validation Loss
          - type: f1
            value: 0.9749
            name: F1 Micro
          - type: f1
            value: 0.9750
            name: F1 Macro
          - type: f1
            value: 0.9750
            name: F1 Weighted
---

# rubert_level1_v2

This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of software requirements in Russian (Level 1).

It achieves the following results on the evaluation set:
* Loss: 0.0727
* F1 Micro: 0.9749
* F1 Macro: 0.9750
* F1 Weighted: 0.9750

## Model description

Level 1 classifier in a cascaded requirements classification pipeline. Classifies Russian-language text fragments from meeting recordings into three categories:

| Label | Description |
|---|---|
| `IsFunctional` | Functional requirements — what the system must do |
| `IsBusiness` | Business requirements — budgets, KPIs, deadlines, regulations |
| `Other (OT)` | Non-requirements — organizational remarks, transition phrases, context |

`IsNonFunctional` is derived automatically as OR over Level 2 predictions and is not predicted by this model directly.

The model is part of a cascaded pipeline:
`Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report`

Per-class classification thresholds are stored in `thresholds.json` in this repository.

## Intended uses & limitations

Intended for classification of Russian-language software requirements extracted from meeting audio recordings. Not suitable for general-purpose text classification or non-Russian languages.

## Training and evaluation data

Custom Russian-language requirements dataset compiled from:
- PROMISE dataset (translated to Russian)
- PURE dataset (parsed from XML, translated to Russian)
- Synthetically generated examples (Grok, Claude Sonnet) across 14 domain areas

Total: ~9800 labeled examples. Train/test split: 80/20, stratified, seed=42.

## Training procedure

### Training hyperparameters

* learning_rate: 2e-05
* train_batch_size: 16
* eval_batch_size: 16
* seed: 42
* optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
* lr_scheduler_type: linear
* lr_scheduler_warmup_ratio: 0.06
* num_epochs: 15 (early stopping patience=3)
* max_length: 96

### Training results

| Training Loss | Epoch | Validation Loss | F1 Micro | F1 Macro | F1 Weighted |
|---|---|---|---|---|---|
| 0.1007 | 1 | 0.1046 | 0.9030 | 0.8907 | 0.8906 |
| 0.0462 | 2 | 0.0471 | 0.9669 | 0.9671 | 0.9671 |
| 0.0215 | 3 | 0.0467 | 0.9698 | 0.9697 | 0.9697 |
| 0.0170 | 4 | 0.0556 | 0.9689 | 0.9689 | 0.9689 |
| 0.0072 | 5 | 0.0784 | 0.9607 | 0.9604 | 0.9605 |
| 0.0055 | 6 | 0.0608 | 0.9724 | 0.9727 | 0.9724 |

Early stopping triggered after epoch 6.

### Per-class results (test set)

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| IsFunctional | 0.934 | 0.948 | 0.941 | 420 |
| IsBusiness | 0.993 | 0.978 | 0.985 | 416 |
| Other (OT) | 1.000 | 1.000 | 1.000 | 421 |
| **micro avg** | **0.975** | **0.975** | **0.975** | 1257 |

### Framework versions

* Transformers 4.57.1
* PyTorch 2.8.0+cu128
* Datasets 4.0.0
* Tokenizers 0.22.2