Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +97 -0
config.json +9 -0
model.pt +3 -0
tokenizer/tokenizer.json +0 -0
tokenizer/tokenizer_config.json +15 -0

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+language:
+- hr
+license: mit
+tags:
+- text-classification
+- hate-speech-detection
+- croatian
+- bertic
+datasets:
+- classla/FRENK-hate-hr
+metrics:
+- f1
+- accuracy
+base_model: classla/bcms-bertic
+pipeline_tag: text-classification
+---
+# Croatian Hate Speech Detection Model (BERTić Fine-tuned)
+This model is a fine-tuned version of [classla/bcms-bertic](https://huggingface.co/classla/bcms-bertic) for binary hate speech classification in Croatian.
+## Model Description
+- **Base Model:** classla/bcms-bertic (BERT pre-trained on 8B tokens of South Slavic text)
+- **Task:** Binary classification (Acceptable vs Offensive)
+- **Language:** Croatian
+- **Dataset:** FRENK Croatian hate speech dataset (10,971 comments)
+## Performance
+| Metric | Score |
+|--------|-------|
+| Accuracy | 81.3% |
+| F1-Macro | 0.810 |
+| F1-Weighted | 0.813 |
+| MCC | 0.621 |
+### Per-Class Performance
+| Class | Precision | Recall | F1-Score |
+|-------|-----------|--------|----------|
+| ACC (Acceptable) | 0.777 | 0.803 | 0.790 |
+| OFF (Offensive) | 0.842 | 0.820 | 0.831 |
+## Training Configuration
+- Learning rate: 2e-5
+- Batch size: 16
+- Epochs: 5
+- Max sequence length: 256 tokens
+- Optimizer: AdamW
+- Warmup ratio: 0.1
+## Usage
+```python
+from src.models.bertic import BERTicTrainer
+# Load model
+trainer = BERTicTrainer()
+trainer.load("path/to/model")
+# Predict
+texts = ["Ovo je normalan komentar.", "Svi su oni lopovi!"]
+predictions = trainer.predict(texts)
+print(predictions)  # ['ACC', 'OFF']
+```
+## Labels
+- `ACC` - Acceptable: No offensive content
+- `OFF` - Offensive: Contains hate speech, insults, or inappropriate content
+## Citation
+```bibtex
+@misc{croatian-hate-speech-2026,
+  author = {Jurić, Duje and Matošević, Teo and Radolović, Teo},
+  title = {Detection of Hate Speech on Croatian Online Portals Using NLP Methods},
+  year = {2026},
+  publisher = {University of Zagreb, FER},
+  url = {https://github.com/TeoMatosevic/slur-analysis-model}
+}
+```
+## Authors
+- Duje Jurić
+- Teo Matošević
+- Teo Radolović
+University of Zagreb, Faculty of Electrical Engineering and Computing
+## License
+MIT License

config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "model_name": "classla/bcms-bertic",
+  "num_labels": 2,
+  "max_length": 256,
+  "label_names": [
+    "ACC",
+    "OFF"
+  ]
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38b5cb34fb80c8d06fcde325a154cdf0896b396f35e8295eb1cfe21b70d80bb6
+size 663589656

tokenizer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "backend": "tokenizers",
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "max_len": 512,
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": false,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}