|
|
--- |
|
|
tags: |
|
|
- text |
|
|
- stance |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- f1 |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
|
|
|
widget: |
|
|
- text: user Bolsonaro is the president of Brazil. He speaks for all brazilians. Greta is a climate activist. Their opinions do create a balance that the world needs now |
|
|
example_title: example 1 |
|
|
- text: user The fact is that she still doesn’t change her ways and still stays non environmental friendly |
|
|
example_title: example 2 |
|
|
- text: user The criteria for these awards dont seem to be very high. |
|
|
example_title: example 3 |
|
|
|
|
|
model-index: |
|
|
- name: StanceBERTa |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
type: social media |
|
|
name: unpublished |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 77.8 |
|
|
- type: accuracy |
|
|
value: 78.5 |
|
|
--- |
|
|
|
|
|
# eevvgg/StanceBERTa |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This model is a fine-tuned version of **distilroberta-base** model to predict 3 categories of stance (negative, positive, neutral) towards some entity mentioned in the text. |
|
|
Fine-tuned on a larger and more balanced data sample compared with the previous version [eevvgg/Stance-Tw](https://huggingface.co/eevvgg/Stance-Tw). |
|
|
|
|
|
|
|
|
- **Developed by:** Ewelina Gajewska |
|
|
|
|
|
- **Model type:** RoBERTa for stance classification |
|
|
- **Language(s) (NLP):** English social media data from Twitter and Reddit |
|
|
- **Finetuned from model:** [distilroberta-base](distilroberta-base) |
|
|
|
|
|
|
|
|
## Uses |
|
|
|
|
|
``` |
|
|
from transformers import pipeline |
|
|
|
|
|
model_path = "eevvgg/StanceBERTa" |
|
|
cls_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)#, device=0 |
|
|
|
|
|
sequence = ["user The fact is that she still doesn’t change her ways and still stays non environmental friendly" |
|
|
"user The criteria for these awards dont seem to be very high."] |
|
|
|
|
|
result = cls_task(sequence) |
|
|
|
|
|
``` |
|
|
|
|
|
Model suited for classification of stance in short text. Fine-tuned on a balanced corpus of size 5.6k, partially semi-annotated. |
|
|
*Suitable for fine-tuning on hate/offensive language detection. |
|
|
|
|
|
## Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** training procedure available in [Colab notebook](https://colab.research.google.com/drive/1-C47Ei7vgYtcfLLBB_Vkm3nblE5zH-aL?usp=sharing) |
|
|
- **Paper :** tba |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Preprocessing |
|
|
|
|
|
Normalization of user mentions and hyperlinks to "@user" and "http" tokens, respectively. |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- trained for 3 epochs, mini-batch size of 8. |
|
|
- loss: 0.509 |
|
|
- learning_rate: 5e-5; weight_decay: 1e-2 |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Results |
|
|
|
|
|
- evaluation on 15% of data. |
|
|
|
|
|
- accuracy: 0.785 |
|
|
- macro avg: |
|
|
- f1: 0.778 |
|
|
- precision: 0.779 |
|
|
- recall: 0.778 |
|
|
- weighted avg: |
|
|
- f1: 0.786 |
|
|
- precision: 0.786 |
|
|
- recall: 0.785 |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** tba |
|
|
|
|
|
|
|
|
|