StanceBERTa / README.md
eevvgg's picture
up read prev model url
1f68e8e
metadata
tags:
  - text
  - stance
language:
  - en
metrics:
  - f1
  - accuracy
pipeline_tag: text-classification
widget:
  - text: >-
      user Bolsonaro is the president of Brazil. He speaks for all brazilians.
      Greta is a climate activist. Their opinions do create a balance that the
      world needs now
    example_title: example 1
  - text: >-
      user The fact is that she still doesn’t change her ways and still stays
      non environmental friendly
    example_title: example 2
  - text: user The criteria for these awards dont seem to be very high.
    example_title: example 3
model-index:
  - name: StanceBERTa
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: social media
          name: unpublished
        metrics:
          - type: f1
            value: 77.8
          - type: accuracy
            value: 78.5

eevvgg/StanceBERTa

This model is a fine-tuned version of distilroberta-base model to predict 3 categories of stance (negative, positive, neutral) towards some entity mentioned in the text. Fine-tuned on a larger and more balanced data sample compared with the previous version eevvgg/Stance-Tw.

  • Developed by: Ewelina Gajewska

  • Model type: RoBERTa for stance classification

  • Language(s) (NLP): English social media data from Twitter and Reddit

  • Finetuned from model: distilroberta-base

Uses

from transformers import pipeline

model_path = "eevvgg/StanceBERTa"
cls_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)#, device=0 

sequence = ["user The fact is that she still doesn’t change her ways and still stays non environmental friendly"
            "user The criteria for these awards dont seem to be very high."]
            
result = cls_task(sequence)
                                        

Model suited for classification of stance in short text. Fine-tuned on a balanced corpus of size 5.6k, partially semi-annotated. *Suitable for fine-tuning on hate/offensive language detection.

Model Sources

  • Repository: training procedure available in Colab notebook
  • Paper : tba

Training Details

Preprocessing

Normalization of user mentions and hyperlinks to "@user" and "http" tokens, respectively.

Training Hyperparameters

  • trained for 3 epochs, mini-batch size of 8.
  • loss: 0.509
  • learning_rate: 5e-5; weight_decay: 1e-2

Evaluation

Results

  • evaluation on 15% of data.

  • accuracy: 0.785

  • macro avg:

    • f1: 0.778
    • precision: 0.779
    • recall: 0.778
  • weighted avg:

    • f1: 0.786
    • precision: 0.786
    • recall: 0.785

Citation

BibTeX: tba