File size: 1,685 Bytes
ca6f8ab
 
 
3d1fc37
 
 
61ed48d
3d1fc37
61ed48d
3d1fc37
 
 
61ed48d
3d1fc37
 
 
 
 
 
 
 
61ed48d
3d1fc37
 
 
 
61ed48d
3d1fc37
 
 
 
 
 
 
 
61ed48d
3d1fc37
61ed48d
3d1fc37
61ed48d
3d1fc37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61ed48d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: mit
---

# Model Card for BERT hate offensive tweets

BERT base uncased trained on the data that can be found here: [MartynaKopyta/hate_offensive_tweets](https://huggingface.co/datasets/MartynaKopyta/hate_offensive_tweets) to classify tweets as 0 - hate, 1 - offensive or 2 - neither.

You can find the notebook used for training in my GitHub repo: [MartynaKopyta/BERT_FINE-TUNING](https://github.com/MartynaKopyta/BERT_FINE-TUNING/blob/main/BERT_hate_offensive_speech.ipynb).

## Model Details

- **Finetuned from model [bert-base-uncased](https://huggingface.co/bert-base-uncased)** 


## Bias, Risks, and Limitations

The dataset was not big enough for BERT to learn to classify 3 classes accurately, it is right 3/4 times.


## How to Get Started with the Model
```
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained('MartynaKopyta/BERT_hate_offensive_tweets')
tokenizer = AutoTokenizer.from_pretrained('MartynaKopyta/BERT_hate_offensive_tweets')
```

#### Training Hyperparameters

- **batch size:16**
- **learning rate:2e-5**
- **epochs:3**

## Evaluation
```
Accuracy: 0.779373368146214

Classification Report:

              precision    recall  f1-score   support

           0       0.74      0.68      0.71      1532
           1       0.85      0.88      0.87      1532
           2       0.74      0.78      0.76      1532

    accuracy                           0.78      4596
   macro avg       0.78      0.78      0.78      4596
weighted avg       0.78      0.78      0.78      4596

Confusion Matrix:
[[1043   96  393]
 [ 169 1343   20]
 [ 204  132 1196]]

MCC: 0.670
```