Croatian Hate Speech Detection Model (BERTić Fine-tuned)
This model is a fine-tuned version of classla/bcms-bertic for binary hate speech classification in Croatian.
Model Description
- Base Model: classla/bcms-bertic (BERT pre-trained on 8B tokens of South Slavic text)
- Task: Binary classification (Acceptable vs Offensive)
- Language: Croatian
- Dataset: FRENK Croatian hate speech dataset (10,971 comments)
Performance
| Metric | Score |
|---|---|
| Accuracy | 81.3% |
| F1-Macro | 0.810 |
| F1-Weighted | 0.813 |
| MCC | 0.621 |
Per-Class Performance
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| ACC (Acceptable) | 0.777 | 0.803 | 0.790 |
| OFF (Offensive) | 0.842 | 0.820 | 0.831 |
Training Configuration
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 5
- Max sequence length: 256 tokens
- Optimizer: AdamW
- Warmup ratio: 0.1
Usage
from src.models.bertic import BERTicTrainer
# Load model
trainer = BERTicTrainer()
trainer.load("path/to/model")
# Predict
texts = ["Ovo je normalan komentar.", "Svi su oni lopovi!"]
predictions = trainer.predict(texts)
print(predictions) # ['ACC', 'OFF']
Labels
ACC- Acceptable: No offensive contentOFF- Offensive: Contains hate speech, insults, or inappropriate content
Citation
@misc{croatian-hate-speech-2026,
author = {Jurić, Duje and Matošević, Teo and Radolović, Teo},
title = {Detection of Hate Speech on Croatian Online Portals Using NLP Methods},
year = {2026},
publisher = {University of Zagreb, FER},
url = {https://github.com/TeoMatosevic/slur-analysis-model}
}
Authors
- Duje Jurić
- Teo Matošević
- Teo Radolović
University of Zagreb, Faculty of Electrical Engineering and Computing
License
MIT License
- Downloads last month
- 5
Model tree for TeoMatosevic/croatian-hate-speech-bertic
Base model
classla/bcms-bertic