Croatian Hate Speech Detection Model (BERTić Fine-tuned)

This model is a fine-tuned version of classla/bcms-bertic for binary hate speech classification in Croatian.

Model Description

  • Base Model: classla/bcms-bertic (BERT pre-trained on 8B tokens of South Slavic text)
  • Task: Binary classification (Acceptable vs Offensive)
  • Language: Croatian
  • Dataset: FRENK Croatian hate speech dataset (10,971 comments)

Performance

Metric Score
Accuracy 81.3%
F1-Macro 0.810
F1-Weighted 0.813
MCC 0.621

Per-Class Performance

Class Precision Recall F1-Score
ACC (Acceptable) 0.777 0.803 0.790
OFF (Offensive) 0.842 0.820 0.831

Training Configuration

  • Learning rate: 2e-5
  • Batch size: 16
  • Epochs: 5
  • Max sequence length: 256 tokens
  • Optimizer: AdamW
  • Warmup ratio: 0.1

Usage

from src.models.bertic import BERTicTrainer

# Load model
trainer = BERTicTrainer()
trainer.load("path/to/model")

# Predict
texts = ["Ovo je normalan komentar.", "Svi su oni lopovi!"]
predictions = trainer.predict(texts)
print(predictions)  # ['ACC', 'OFF']

Labels

  • ACC - Acceptable: No offensive content
  • OFF - Offensive: Contains hate speech, insults, or inappropriate content

Citation

@misc{croatian-hate-speech-2026,
  author = {Jurić, Duje and Matošević, Teo and Radolović, Teo},
  title = {Detection of Hate Speech on Croatian Online Portals Using NLP Methods},
  year = {2026},
  publisher = {University of Zagreb, FER},
  url = {https://github.com/TeoMatosevic/slur-analysis-model}
}

Authors

  • Duje Jurić
  • Teo Matošević
  • Teo Radolović

University of Zagreb, Faculty of Electrical Engineering and Computing

License

MIT License

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TeoMatosevic/croatian-hate-speech-bertic

Finetuned
(12)
this model

Dataset used to train TeoMatosevic/croatian-hate-speech-bertic