| --- |
| library_name: transformers |
| base_model: DeepPavlov/rubert-base-cased |
| language: |
| - ru |
| tags: |
| - text-classification |
| - bert |
| - safetensors |
| - multilabel-classification |
| - requirements-engineering |
| - generated_from_trainer |
| model-index: |
| - name: rubert_level1_v2 |
| results: |
| - task: |
| type: text-classification |
| metrics: |
| - type: loss |
| value: 0.0727 |
| name: Validation Loss |
| - type: f1 |
| value: 0.9749 |
| name: F1 Micro |
| - type: f1 |
| value: 0.9750 |
| name: F1 Macro |
| - type: f1 |
| value: 0.9750 |
| name: F1 Weighted |
| --- |
| |
| # rubert_level1_v2 |
|
|
| This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of software requirements in Russian (Level 1). |
|
|
| It achieves the following results on the evaluation set: |
| * Loss: 0.0727 |
| * F1 Micro: 0.9749 |
| * F1 Macro: 0.9750 |
| * F1 Weighted: 0.9750 |
|
|
| ## Model description |
|
|
| Level 1 classifier in a cascaded requirements classification pipeline. Classifies Russian-language text fragments from meeting recordings into three categories: |
|
|
| | Label | Description | |
| |---|---| |
| | `IsFunctional` | Functional requirements — what the system must do | |
| | `IsBusiness` | Business requirements — budgets, KPIs, deadlines, regulations | |
| | `Other (OT)` | Non-requirements — organizational remarks, transition phrases, context | |
|
|
| `IsNonFunctional` is derived automatically as OR over Level 2 predictions and is not predicted by this model directly. |
|
|
| The model is part of a cascaded pipeline: |
| `Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report` |
|
|
| Per-class classification thresholds are stored in `thresholds.json` in this repository. |
|
|
| ## Intended uses & limitations |
|
|
| Intended for classification of Russian-language software requirements extracted from meeting audio recordings. Not suitable for general-purpose text classification or non-Russian languages. |
|
|
| ## Training and evaluation data |
|
|
| Custom Russian-language requirements dataset compiled from: |
| - PROMISE dataset (translated to Russian) |
| - PURE dataset (parsed from XML, translated to Russian) |
| - Synthetically generated examples (Grok, Claude Sonnet) across 14 domain areas |
|
|
| Total: ~9800 labeled examples. Train/test split: 80/20, stratified, seed=42. |
|
|
| ## Training procedure |
|
|
| ### Training hyperparameters |
|
|
| * learning_rate: 2e-05 |
| * train_batch_size: 16 |
| * eval_batch_size: 16 |
| * seed: 42 |
| * optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08 |
| * lr_scheduler_type: linear |
| * lr_scheduler_warmup_ratio: 0.06 |
| * num_epochs: 15 (early stopping patience=3) |
| * max_length: 96 |
|
|
| ### Training results |
|
|
| | Training Loss | Epoch | Validation Loss | F1 Micro | F1 Macro | F1 Weighted | |
| |---|---|---|---|---|---| |
| | 0.1007 | 1 | 0.1046 | 0.9030 | 0.8907 | 0.8906 | |
| | 0.0462 | 2 | 0.0471 | 0.9669 | 0.9671 | 0.9671 | |
| | 0.0215 | 3 | 0.0467 | 0.9698 | 0.9697 | 0.9697 | |
| | 0.0170 | 4 | 0.0556 | 0.9689 | 0.9689 | 0.9689 | |
| | 0.0072 | 5 | 0.0784 | 0.9607 | 0.9604 | 0.9605 | |
| | 0.0055 | 6 | 0.0608 | 0.9724 | 0.9727 | 0.9724 | |
|
|
| Early stopping triggered after epoch 6. |
|
|
| ### Per-class results (test set) |
|
|
| | Class | Precision | Recall | F1 | Support | |
| |---|---|---|---|---| |
| | IsFunctional | 0.934 | 0.948 | 0.941 | 420 | |
| | IsBusiness | 0.993 | 0.978 | 0.985 | 416 | |
| | Other (OT) | 1.000 | 1.000 | 1.000 | 421 | |
| | **micro avg** | **0.975** | **0.975** | **0.975** | 1257 | |
|
|
| ### Framework versions |
|
|
| * Transformers 4.57.1 |
| * PyTorch 2.8.0+cu128 |
| * Datasets 4.0.0 |
| * Tokenizers 0.22.2 |