File size: 1,556 Bytes
908dd41
 
 
 
 
 
c5524b2
 
 
 
 
 
6efe33f
 
c5524b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: mit
datasets:
- haipradana/indonesian-twitter-hate-speech-cleaned
language:
- id
tags:
- bert
- RoBERTa
- tweet
- hate
- twitter
base_model:
- cardiffnlp/twitter-roberta-base-sentiment-latest
---

# Fine-tuned RoBERTa pre-trained model to classify Indonesian hate tweet(s)

Just check GitHub for full-code and Google Colab: https://github.com/haipradana/RoBERTa-Indonesian-Hate-Tweet-Classification/tree/main

This project fine-tunes a RoBERTa model from [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) to classify Indonesian tweets as either **neutral** or **hate speech**.

## How to use this model?


```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForSequenceClassification.from_pretrained('./model')

# Predict
def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=511)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return 'hate' if prediction == 1 else 'neutral'

# Example
result = predict("Paru-parumu terbuat dari batu ya? udah sakit gini masih aja merokok!")
print(result)  # Output: hate
```

### Or just using the script in the GitHub Repos

```bash
cd scripts
python predict.py
```
## Performance Metrics

```
Accuracy:  82.01%
Precision: 82.68%
Recall:    81.72%
F1-Score:  82.19%
```