herman3996
/

comments_classifier

Text Classification

Model card Files Files and versions

comments_classifier / README.md

herman3996's picture

Update README.md

af1a69e verified 28 days ago

|

history blame contribute delete

3.54 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- pytorch
	---

	# Comments Classifier (RuBERT fine-tune)

	A Russian-language comment classification model fine-tuned on top of RuBERT. Developed as part of the Lubarsky Comments Model project.

	## Overview

	The model was fine-tuned on a labeled dataset of Russian-language comments. Its goal is to automatically determine the category/type of a given comment.

	The repository contains three ready-to-use standalone applications built with PyInstaller — no Python installation or dependencies required:

	\| File \| Size \| Description \|
	\|---\|---\|---\|
	\| `trainer.zip` \| ~2.6 GB \| Application for fine-tuning the model \|
	\| `prediction.zip` \| ~2.5 GB \| Application for running predictions \|
	\| `classifier.zip` \| ~60 MB \| Application for manual comment classification \|
	\| `QA_dataset.csv` \| ~75 kB \| Quality assurance dataset \|

	---

	## Quick Start

	> ⚠️ No Python installation required — all three programs are self-contained `.exe` applications.

	### 1. Download the ZIP archive

	Download one or more archives from this page.

	### 2. Extract the archive

	Extract the downloaded archive to a convenient location. The folder structure will look like this:

	```
	classifier/
	├── _internal/ # internal dependencies (do not modify)
	└── run_classifier.exe # executable file
	```

	### 3. Run the `.exe`

	Simply double-click the `.exe` file or launch it from the terminal:

	```bash
	.\run_classifier.exe
	.\run_prediction.exe
	.\run_trainer.exe
	```

	---

	## Application Descriptions

	`run_classifier` — a tool for manual or batch comment classification. Useful for quick review and labeling.

	`run_prediction` — the main inference application. Takes comments as input and returns predicted classes.

	`run_trainer` — fine-tunes the model on new data. Allows you to retrain the classifier on your own dataset.

	---

	## Environment Configuration

	The repository includes a `.env` file with environment variables (e.g., file paths, parameters). Edit it as needed before running the applications.

	---

	## Source Code

	The full source code (training, data labeling, scripts) is available on GitHub:
	👉 [gerageragera39/Lubarsky_Comments_Model](https://github.com/gerageragera39/Lubarsky_Comments_Model)

	Source repository structure:
	- `data_hand_classifier/` — tools for manual data labeling
	- `rubert_trainer/` — RuBERT fine-tuning scripts
	- `dataset.csv` — main training dataset
	- `test_comments.csv` — test set
	- `result.png` — training results visualization

	---

	## Technical Details

	- Base model: RuBERT (DeepPavlov)
	- Framework: PyTorch + HuggingFace Transformers
	- Build: PyInstaller (standalone Windows executables)
	- Data language: Russian
	- Task: Text Classification

	---

	## License

	This project is released under the [MIT License](https://opensource.org/licenses/MIT).
	You are free to use, modify, and distribute this software for both personal and commercial purposes, provided that the original copyright notice is retained.

	> Note: Since the applications are packaged with PyInstaller, they may be flagged by antivirus software as suspicious. This is a known false positive common to PyInstaller-built executables. You may need to add an exception in your antivirus or temporarily disable it to run the applications.

	The software is provided as is, without warranty of any kind. The author takes no responsibility for any issues, damages, or data loss that may arise from its use.