|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- haipradana/indonesian-twitter-hate-speech-cleaned |
|
|
language: |
|
|
- id |
|
|
tags: |
|
|
- bert |
|
|
- RoBERTa |
|
|
- tweet |
|
|
- hate |
|
|
- twitter |
|
|
base_model: |
|
|
- cardiffnlp/twitter-roberta-base-sentiment-latest |
|
|
--- |
|
|
|
|
|
# Fine-tuned RoBERTa pre-trained model to classify Indonesian hate tweet(s) |
|
|
|
|
|
Just check GitHub for full-code and Google Colab: https://github.com/haipradana/RoBERTa-Indonesian-Hate-Tweet-Classification/tree/main |
|
|
|
|
|
This project fine-tunes a RoBERTa model from [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) to classify Indonesian tweets as either **neutral** or **hate speech**. |
|
|
|
|
|
## How to use this model? |
|
|
|
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model |
|
|
tokenizer = AutoTokenizer.from_pretrained('./model') |
|
|
model = AutoModelForSequenceClassification.from_pretrained('./model') |
|
|
|
|
|
# Predict |
|
|
def predict(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=511) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
prediction = torch.argmax(outputs.logits, dim=1).item() |
|
|
return 'hate' if prediction == 1 else 'neutral' |
|
|
|
|
|
# Example |
|
|
result = predict("Paru-parumu terbuat dari batu ya? udah sakit gini masih aja merokok!") |
|
|
print(result) # Output: hate |
|
|
``` |
|
|
|
|
|
### Or just using the script in the GitHub Repos |
|
|
|
|
|
```bash |
|
|
cd scripts |
|
|
python predict.py |
|
|
``` |
|
|
## Performance Metrics |
|
|
|
|
|
``` |
|
|
Accuracy: 82.01% |
|
|
Precision: 82.68% |
|
|
Recall: 81.72% |
|
|
F1-Score: 82.19% |
|
|
``` |