eternalGenius
/

rubert_level2_v2

Text Classification

multilabel-classification

requirements-engineering

Generated from Trainer

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

rubert_level2_v2 / README.md

eternalGenius's picture

Update README.md

2121e28 verified about 1 month ago

|

history blame contribute delete

3.64 kB

	---
	library_name: transformers
	base_model: DeepPavlov/rubert-base-cased
	language:
	- ru
	tags:
	- text-classification
	- bert
	- safetensors
	- multilabel-classification
	- requirements-engineering
	- generated_from_trainer
	model-index:
	- name: rubert_level2_v2
	results:
	- task:
	type: text-classification
	metrics:
	- type: f1
	value: 0.9110
	name: F1 Micro
	- type: f1
	value: 0.9110
	name: F1 Macro
	- type: f1
	value: 0.9120
	name: F1 Weighted
	---

	# rubert_level2_v2

	This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of non-functional software requirements in Russian (Level 2).

	It achieves the following results on the evaluation set:
	* F1 Micro: 0.9110
	* F1 Macro: 0.9110
	* F1 Weighted: 0.9120

	## Model description

	Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as `IsNonFunctional` by Level 1. Classifies into 11 non-functional requirement subcategories:

	\| Label \| Description \|
	\|---\|---\|
	\| `Availability (A)` \| Uptime, SLA, availability percentage \|
	\| `Fault Tolerance (FT)` \| Failover, recovery, redundancy \|
	\| `Legal (L)` \| Regulatory compliance, standards, licenses \|
	\| `Look & Feel (LF)` \| Visual style, UI design \|
	\| `Maintainability (MN)` \| Code quality, documentation, tech debt \|
	\| `Operability (O)` \| Monitoring, administration, observability \|
	\| `Performance (PE)` \| Response time, throughput, latency \|
	\| `Portability (PO)` \| Platform and OS compatibility \|
	\| `Scalability (SC)` \| Load scaling, growth capacity \|
	\| `Security (SE)` \| Authentication, authorization, encryption \|
	\| `Usability (US)` \| UX, ease of use, learnability \|

	The model is part of a cascaded pipeline:
	`Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report`

	Per-class thresholds are stored in `thresholds.json` in the `eternalGenius/rubert_level1_v2` repository.

	## Intended uses & limitations

	Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as `IsNonFunctional` by Level 1.

	## Training and evaluation data

	Same dataset as Level 1, filtered to `IsNonFunctional=1` rows only.

	Train: 772 examples \| Test: 191 examples per class (11 classes, ~500 examples each).

	## Training procedure

	### Training hyperparameters

	* learning_rate: 5e-06
	* train_batch_size: 16
	* eval_batch_size: 16
	* seed: 42
	* optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
	* lr_scheduler_type: linear
	* lr_scheduler_warmup_ratio: 0.06
	* num_epochs: 15 (early stopping patience=3)
	* max_length: 96

	### Per-class results (test set)

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|---\|---\|---\|---\|---\|
	\| Availability (A) \| 1.000 \| 0.939 \| 0.968 \| 98 \|
	\| Fault Tolerance (FT) \| 0.981 \| 0.920 \| 0.949 \| 112 \|
	\| Legal (L) \| 0.860 \| 0.925 \| 0.891 \| 106 \|
	\| Look & Feel (LF) \| 0.957 \| 0.918 \| 0.938 \| 98 \|
	\| Maintainability (MN) \| 0.816 \| 0.853 \| 0.834 \| 109 \|
	\| Operability (O) \| 0.976 \| 0.883 \| 0.927 \| 94 \|
	\| Performance (PE) \| 0.883 \| 0.958 \| 0.919 \| 118 \|
	\| Portability (PO) \| 0.911 \| 0.944 \| 0.927 \| 108 \|
	\| Scalability (SC) \| 0.971 \| 0.952 \| 0.962 \| 105 \|
	\| Security (SE) \| 0.858 \| 0.875 \| 0.867 \| 104 \|
	\| Usability (US) \| 0.831 \| 0.841 \| 0.836 \| 82 \|
	\| micro avg \| 0.910 \| 0.912 \| 0.911 \| 1134 \|

	### Framework versions

	* Transformers 4.57.1
	* PyTorch 2.8.0+cu128
	* Datasets 4.0.0
	* Tokenizers 0.22.2