|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Model Card for BERT hate offensive tweets |
|
|
|
|
|
BERT base uncased trained on the data that can be found here: [MartynaKopyta/hate_offensive_tweets](https://huggingface.co/datasets/MartynaKopyta/hate_offensive_tweets) to classify tweets as 0 - hate, 1 - offensive or 2 - neither. |
|
|
|
|
|
You can find the notebook used for training in my GitHub repo: [MartynaKopyta/BERT_FINE-TUNING](https://github.com/MartynaKopyta/BERT_FINE-TUNING/blob/main/BERT_hate_offensive_speech.ipynb). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Finetuned from model [bert-base-uncased](https://huggingface.co/bert-base-uncased)** |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
The dataset was not big enough for BERT to learn to classify 3 classes accurately, it is right 3/4 times. |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
``` |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained('MartynaKopyta/BERT_hate_offensive_tweets') |
|
|
tokenizer = AutoTokenizer.from_pretrained('MartynaKopyta/BERT_hate_offensive_tweets') |
|
|
``` |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **batch size:16** |
|
|
- **learning rate:2e-5** |
|
|
- **epochs:3** |
|
|
|
|
|
## Evaluation |
|
|
``` |
|
|
Accuracy: 0.779373368146214 |
|
|
|
|
|
Classification Report: |
|
|
|
|
|
precision recall f1-score support |
|
|
|
|
|
0 0.74 0.68 0.71 1532 |
|
|
1 0.85 0.88 0.87 1532 |
|
|
2 0.74 0.78 0.76 1532 |
|
|
|
|
|
accuracy 0.78 4596 |
|
|
macro avg 0.78 0.78 0.78 4596 |
|
|
weighted avg 0.78 0.78 0.78 4596 |
|
|
|
|
|
Confusion Matrix: |
|
|
[[1043 96 393] |
|
|
[ 169 1343 20] |
|
|
[ 204 132 1196]] |
|
|
|
|
|
MCC: 0.670 |
|
|
``` |