Croatian Hate Speech Detection Model (BERTić Fine-tuned)

This model is a fine-tuned version of classla/bcms-bertic for binary hate speech classification in Croatian.

Model Description

Base Model: classla/bcms-bertic (BERT pre-trained on 8B tokens of South Slavic text)
Task: Binary classification (Acceptable vs Offensive)
Language: Croatian
Dataset: FRENK Croatian hate speech dataset (10,971 comments)

Performance

Metric	Score
Accuracy	81.3%
F1-Macro	0.810
F1-Weighted	0.813
MCC	0.621

Per-Class Performance

Class	Precision	Recall	F1-Score
ACC (Acceptable)	0.777	0.803	0.790
OFF (Offensive)	0.842	0.820	0.831

Training Configuration

Learning rate: 2e-5
Batch size: 16
Epochs: 5
Max sequence length: 256 tokens
Optimizer: AdamW
Warmup ratio: 0.1

Usage

from src.models.bertic import BERTicTrainer

# Load model
trainer = BERTicTrainer()
trainer.load("path/to/model")

# Predict
texts = ["Ovo je normalan komentar.", "Svi su oni lopovi!"]
predictions = trainer.predict(texts)
print(predictions)  # ['ACC', 'OFF']

Labels

ACC - Acceptable: No offensive content
OFF - Offensive: Contains hate speech, insults, or inappropriate content

Citation

@misc{croatian-hate-speech-2026,
  author = {Jurić, Duje and Matošević, Teo and Radolović, Teo},
  title = {Detection of Hate Speech on Croatian Online Portals Using NLP Methods},
  year = {2026},
  publisher = {University of Zagreb, FER},
  url = {https://github.com/TeoMatosevic/slur-analysis-model}
}

Authors

Duje Jurić
Teo Matošević
Teo Radolović

University of Zagreb, Faculty of Electrical Engineering and Computing

License

MIT License

Downloads last month: 2

Model tree for TeoMatosevic/croatian-hate-speech-bertic

Base model

classla/bcms-bertic

Finetuned

(12)

this model

TeoMatosevic
/

croatian-hate-speech-bertic