File size: 1,556 Bytes
908dd41 c5524b2 6efe33f c5524b2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
license: mit
datasets:
- haipradana/indonesian-twitter-hate-speech-cleaned
language:
- id
tags:
- bert
- RoBERTa
- tweet
- hate
- twitter
base_model:
- cardiffnlp/twitter-roberta-base-sentiment-latest
---
# Fine-tuned RoBERTa pre-trained model to classify Indonesian hate tweet(s)
Just check GitHub for full-code and Google Colab: https://github.com/haipradana/RoBERTa-Indonesian-Hate-Tweet-Classification/tree/main
This project fine-tunes a RoBERTa model from [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) to classify Indonesian tweets as either **neutral** or **hate speech**.
## How to use this model?
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForSequenceClassification.from_pretrained('./model')
# Predict
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=511)
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
return 'hate' if prediction == 1 else 'neutral'
# Example
result = predict("Paru-parumu terbuat dari batu ya? udah sakit gini masih aja merokok!")
print(result) # Output: hate
```
### Or just using the script in the GitHub Repos
```bash
cd scripts
python predict.py
```
## Performance Metrics
```
Accuracy: 82.01%
Precision: 82.68%
Recall: 81.72%
F1-Score: 82.19%
``` |