|
|
--- |
|
|
license: apache-2.0 |
|
|
language: sw |
|
|
tags: |
|
|
- hate-speech |
|
|
- swahili |
|
|
- text-classification |
|
|
- bert |
|
|
- offensive-language |
|
|
- political-hate-speech |
|
|
datasets: |
|
|
- custom |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Swahili Hate Speech Classification Model |
|
|
|
|
|
This is a fine-tuned BERT model for **multi-class text classification** in Swahili. It predicts whether a given text is: |
|
|
|
|
|
- **Non-hate speech** |
|
|
- **Political hate speech** |
|
|
- **Offensive language** |
|
|
|
|
|
## ๐ง Model Details |
|
|
|
|
|
- **Architecture**: BERT (base) |
|
|
- **Languages**: Swahili |
|
|
- **Classes**: 3 |
|
|
- **Model size**: 178M parameters |
|
|
- **Framework**: PyTorch |
|
|
- **Training data**: A custom labeled dataset of Swahili social media or online comments (non-public) |
|
|
|
|
|
## ๐ท๏ธ Labels |
|
|
|
|
|
| Label ID | Class Name | |
|
|
|----------|--------------------------| |
|
|
| `LABEL_0` | Non-hate speech | |
|
|
| `LABEL_1` | Political hate speech | |
|
|
| `LABEL_2` | Offensive language | |
|
|
|
|
|
## ๐ Usage |
|
|
|
|
|
You can load and test the model using the `transformers` library: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("text-classification", model="sandbox338/hatespeech") |
|
|
|
|
|
result = classifier("Hii ni ujumbe wa kawaida bila matusi.") |
|
|
print(result) # [{'label': 'LABEL_0', 'score': 0.98}] |
|
|
|