Comments Classifier (RuBERT fine-tune)

A Russian-language comment classification model fine-tuned on top of RuBERT. Developed as part of the Lubarsky Comments Model project.

Overview

The model was fine-tuned on a labeled dataset of Russian-language comments. Its goal is to automatically determine the category/type of a given comment.

The repository contains three ready-to-use standalone applications built with PyInstaller — no Python installation or dependencies required:

File	Size	Description
`trainer.zip`	~2.6 GB	Application for fine-tuning the model
`prediction.zip`	~2.5 GB	Application for running predictions
`classifier.zip`	~60 MB	Application for manual comment classification
`QA_dataset.csv`	~75 kB	Quality assurance dataset

Quick Start

⚠️ No Python installation required — all three programs are self-contained .exe applications.

1. Download the ZIP archive

Download one or more archives from this page.

2. Extract the archive

Extract the downloaded archive to a convenient location. The folder structure will look like this:

classifier/
├── _internal/          # internal dependencies (do not modify)
└── run_classifier.exe  # executable file

3. Run the `.exe`

Simply double-click the .exe file or launch it from the terminal:

.\run_classifier.exe
.\run_prediction.exe
.\run_trainer.exe

Application Descriptions

run_classifier — a tool for manual or batch comment classification. Useful for quick review and labeling.

run_prediction — the main inference application. Takes comments as input and returns predicted classes.

run_trainer — fine-tunes the model on new data. Allows you to retrain the classifier on your own dataset.

Environment Configuration

The repository includes a .env file with environment variables (e.g., file paths, parameters). Edit it as needed before running the applications.

Source Code

The full source code (training, data labeling, scripts) is available on GitHub: 👉 gerageragera39/Lubarsky_Comments_Model

Source repository structure:

data_hand_classifier/ — tools for manual data labeling
rubert_trainer/ — RuBERT fine-tuning scripts
dataset.csv — main training dataset
test_comments.csv — test set
result.png — training results visualization

Technical Details

Base model: RuBERT (DeepPavlov)
Framework: PyTorch + HuggingFace Transformers
Build: PyInstaller (standalone Windows executables)
Data language: Russian
Task: Text Classification

License

This project is released under the MIT License.
You are free to use, modify, and distribute this software for both personal and commercial purposes, provided that the original copyright notice is retained.

Note: Since the applications are packaged with PyInstaller, they may be flagged by antivirus software as suspicious. This is a known false positive common to PyInstaller-built executables. You may need to add an exception in your antivirus or temporarily disable it to run the applications.

The software is provided as is, without warranty of any kind. The author takes no responsibility for any issues, damages, or data loss that may arise from its use.

Downloads last month: -; Downloads are not tracked for this model. How to track