Hate-speech-CNERG/hatexplain
Updated β’ 2.83k β’ 23
This is a fine-tuned BERT model (bert-base-uncased) for hate speech severity prediction, developed as part of an MSc research project at the University of Moratuwa, Sri Lanka.
The model predicts hate speech severity across three levels:
It also produces a continuous severity score S in [0,1]: S = 0.0 x P(Level 0) + 0.5 x P(Level 1) + 1.0 x P(Level 2)
Fine-tuned on HateXplain (Mathew et al., 2021):
| Metric | SVM | BERT |
|---|---|---|
| Accuracy | 0.629 | 0.684 |
| Macro F1 | 0.615 | 0.679 |
Severity Prediction Metrics:
from transformers import BertForSequenceClassification, BertTokenizer
import torch
import torch.nn.functional as F
import numpy as np
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('UdaniSJ/hate-speech-severity-bert')
model.eval()
def predict_severity(text):
inputs = tokenizer(text, return_tensors='pt',
truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=1).numpy()[0]
score = 0.0*probs[0] + 0.5*probs[1] + 1.0*probs[2]
level = int(np.argmax(probs))
names = {0:'Non-hate', 1:'Mild', 2:'Severe'}
return {'level': names[level], 'score': round(float(score),3)}
print(predict_severity("I love all people regardless of background"))
https://huggingface.co/spaces/UdaniSJ/hate-speech-severity-predictor