Comments Classifier (RuBERT fine-tune)

A Russian-language comment classification model fine-tuned on top of RuBERT. Developed as part of the Lubarsky Comments Model project.

Overview

The model was fine-tuned on a labeled dataset of Russian-language comments. Its goal is to automatically determine the category/type of a given comment.

The repository contains three ready-to-use standalone applications built with PyInstaller β€” no Python installation or dependencies required:

File Size Description
trainer.zip ~2.6 GB Application for fine-tuning the model
prediction.zip ~2.5 GB Application for running predictions
classifier.zip ~60 MB Application for manual comment classification
QA_dataset.csv ~75 kB Quality assurance dataset

Quick Start

⚠️ No Python installation required β€” all three programs are self-contained .exe applications.

1. Download the ZIP archive

Download one or more archives from this page.

2. Extract the archive

Extract the downloaded archive to a convenient location. The folder structure will look like this:

classifier/
β”œβ”€β”€ _internal/          # internal dependencies (do not modify)
└── run_classifier.exe  # executable file

3. Run the .exe

Simply double-click the .exe file or launch it from the terminal:

.\run_classifier.exe
.\run_prediction.exe
.\run_trainer.exe

Application Descriptions

run_classifier β€” a tool for manual or batch comment classification. Useful for quick review and labeling.

run_prediction β€” the main inference application. Takes comments as input and returns predicted classes.

run_trainer β€” fine-tunes the model on new data. Allows you to retrain the classifier on your own dataset.


Environment Configuration

The repository includes a .env file with environment variables (e.g., file paths, parameters). Edit it as needed before running the applications.


Source Code

The full source code (training, data labeling, scripts) is available on GitHub: πŸ‘‰ gerageragera39/Lubarsky_Comments_Model

Source repository structure:

  • data_hand_classifier/ β€” tools for manual data labeling
  • rubert_trainer/ β€” RuBERT fine-tuning scripts
  • dataset.csv β€” main training dataset
  • test_comments.csv β€” test set
  • result.png β€” training results visualization

Technical Details

  • Base model: RuBERT (DeepPavlov)
  • Framework: PyTorch + HuggingFace Transformers
  • Build: PyInstaller (standalone Windows executables)
  • Data language: Russian
  • Task: Text Classification

License

This project is released under the MIT License.
You are free to use, modify, and distribute this software for both personal and commercial purposes, provided that the original copyright notice is retained.

Note: Since the applications are packaged with PyInstaller, they may be flagged by antivirus software as suspicious. This is a known false positive common to PyInstaller-built executables. You may need to add an exception in your antivirus or temporarily disable it to run the applications.

The software is provided as is, without warranty of any kind. The author takes no responsibility for any issues, damages, or data loss that may arise from its use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support