Comments Classifier (RuBERT fine-tune)
A Russian-language comment classification model fine-tuned on top of RuBERT. Developed as part of the Lubarsky Comments Model project.
Overview
The model was fine-tuned on a labeled dataset of Russian-language comments. Its goal is to automatically determine the category/type of a given comment.
The repository contains three ready-to-use standalone applications built with PyInstaller β no Python installation or dependencies required:
| File | Size | Description |
|---|---|---|
trainer.zip |
~2.6 GB | Application for fine-tuning the model |
prediction.zip |
~2.5 GB | Application for running predictions |
classifier.zip |
~60 MB | Application for manual comment classification |
QA_dataset.csv |
~75 kB | Quality assurance dataset |
Quick Start
β οΈ No Python installation required β all three programs are self-contained
.exeapplications.
1. Download the ZIP archive
Download one or more archives from this page.
2. Extract the archive
Extract the downloaded archive to a convenient location. The folder structure will look like this:
classifier/
βββ _internal/ # internal dependencies (do not modify)
βββ run_classifier.exe # executable file
3. Run the .exe
Simply double-click the .exe file or launch it from the terminal:
.\run_classifier.exe
.\run_prediction.exe
.\run_trainer.exe
Application Descriptions
run_classifier β a tool for manual or batch comment classification. Useful for quick review and labeling.
run_prediction β the main inference application. Takes comments as input and returns predicted classes.
run_trainer β fine-tunes the model on new data. Allows you to retrain the classifier on your own dataset.
Environment Configuration
The repository includes a .env file with environment variables (e.g., file paths, parameters). Edit it as needed before running the applications.
Source Code
The full source code (training, data labeling, scripts) is available on GitHub: π gerageragera39/Lubarsky_Comments_Model
Source repository structure:
data_hand_classifier/β tools for manual data labelingrubert_trainer/β RuBERT fine-tuning scriptsdataset.csvβ main training datasettest_comments.csvβ test setresult.pngβ training results visualization
Technical Details
- Base model: RuBERT (DeepPavlov)
- Framework: PyTorch + HuggingFace Transformers
- Build: PyInstaller (standalone Windows executables)
- Data language: Russian
- Task: Text Classification
License
This project is released under the MIT License.
You are free to use, modify, and distribute this software for both personal and commercial purposes, provided that the original copyright notice is retained.
Note: Since the applications are packaged with PyInstaller, they may be flagged by antivirus software as suspicious. This is a known false positive common to PyInstaller-built executables. You may need to add an exception in your antivirus or temporarily disable it to run the applications.
The software is provided as is, without warranty of any kind. The author takes no responsibility for any issues, damages, or data loss that may arise from its use.