eternalGenius
/

rubert_level1_v2

Text Classification

multilabel-classification

requirements-engineering

Generated from Trainer

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

rubert_level1_v2 / README.md

eternalGenius's picture

Update README.md

ccafb31 verified about 1 month ago

|

history blame contribute delete

3.6 kB

	---
	library_name: transformers
	base_model: DeepPavlov/rubert-base-cased
	language:
	- ru
	tags:
	- text-classification
	- bert
	- safetensors
	- multilabel-classification
	- requirements-engineering
	- generated_from_trainer
	model-index:
	- name: rubert_level1_v2
	results:
	- task:
	type: text-classification
	metrics:
	- type: loss
	value: 0.0727
	name: Validation Loss
	- type: f1
	value: 0.9749
	name: F1 Micro
	- type: f1
	value: 0.9750
	name: F1 Macro
	- type: f1
	value: 0.9750
	name: F1 Weighted
	---

	# rubert_level1_v2

	This model is a fine-tuned version of [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) for multilabel classification of software requirements in Russian (Level 1).

	It achieves the following results on the evaluation set:
	* Loss: 0.0727
	* F1 Micro: 0.9749
	* F1 Macro: 0.9750
	* F1 Weighted: 0.9750

	## Model description

	Level 1 classifier in a cascaded requirements classification pipeline. Classifies Russian-language text fragments from meeting recordings into three categories:

	\| Label \| Description \|
	\|---\|---\|
	\| `IsFunctional` \| Functional requirements — what the system must do \|
	\| `IsBusiness` \| Business requirements — budgets, KPIs, deadlines, regulations \|
	\| `Other (OT)` \| Non-requirements — organizational remarks, transition phrases, context \|

	`IsNonFunctional` is derived automatically as OR over Level 2 predictions and is not predicted by this model directly.

	The model is part of a cascaded pipeline:
	`Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report`

	Per-class classification thresholds are stored in `thresholds.json` in this repository.

	## Intended uses & limitations

	Intended for classification of Russian-language software requirements extracted from meeting audio recordings. Not suitable for general-purpose text classification or non-Russian languages.

	## Training and evaluation data

	Custom Russian-language requirements dataset compiled from:
	- PROMISE dataset (translated to Russian)
	- PURE dataset (parsed from XML, translated to Russian)
	- Synthetically generated examples (Grok, Claude Sonnet) across 14 domain areas

	Total: ~9800 labeled examples. Train/test split: 80/20, stratified, seed=42.

	## Training procedure

	### Training hyperparameters

	* learning_rate: 2e-05
	* train_batch_size: 16
	* eval_batch_size: 16
	* seed: 42
	* optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
	* lr_scheduler_type: linear
	* lr_scheduler_warmup_ratio: 0.06
	* num_epochs: 15 (early stopping patience=3)
	* max_length: 96

	### Training results

	\| Training Loss \| Epoch \| Validation Loss \| F1 Micro \| F1 Macro \| F1 Weighted \|
	\|---\|---\|---\|---\|---\|---\|
	\| 0.1007 \| 1 \| 0.1046 \| 0.9030 \| 0.8907 \| 0.8906 \|
	\| 0.0462 \| 2 \| 0.0471 \| 0.9669 \| 0.9671 \| 0.9671 \|
	\| 0.0215 \| 3 \| 0.0467 \| 0.9698 \| 0.9697 \| 0.9697 \|
	\| 0.0170 \| 4 \| 0.0556 \| 0.9689 \| 0.9689 \| 0.9689 \|
	\| 0.0072 \| 5 \| 0.0784 \| 0.9607 \| 0.9604 \| 0.9605 \|
	\| 0.0055 \| 6 \| 0.0608 \| 0.9724 \| 0.9727 \| 0.9724 \|

	Early stopping triggered after epoch 6.

	### Per-class results (test set)

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|---\|---\|---\|---\|---\|
	\| IsFunctional \| 0.934 \| 0.948 \| 0.941 \| 420 \|
	\| IsBusiness \| 0.993 \| 0.978 \| 0.985 \| 416 \|
	\| Other (OT) \| 1.000 \| 1.000 \| 1.000 \| 421 \|
	\| micro avg \| 0.975 \| 0.975 \| 0.975 \| 1257 \|

	### Framework versions

	* Transformers 4.57.1
	* PyTorch 2.8.0+cu128
	* Datasets 4.0.0
	* Tokenizers 0.22.2