File size: 3,123 Bytes
3329ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c95b433
3329ffc
 
 
 
 
 
 
 
 
62d3dae
3329ffc
62d3dae
3329ffc
 
c95b433
3329ffc
 
 
62d3dae
1f68e8e
3329ffc
 
 
 
 
 
62d3dae
3329ffc
 
 
 
 
 
 
34b5713
3329ffc
 
c4be4cf
 
3329ffc
 
 
 
 
ca3d818
c4be4cf
3329ffc
 
 
 
 
 
 
 
 
 
 
 
 
1974878
3329ffc
 
 
62d3dae
 
 
3329ffc
 
 
 
 
 
 
62d3dae
3329ffc
62d3dae
 
 
3329ffc
62d3dae
 
 
3329ffc
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
tags:
  - text
  - stance
language:
- en
metrics:
- f1
- accuracy
pipeline_tag: text-classification

widget:
- text: user Bolsonaro is the president of Brazil. He speaks for all brazilians. Greta is a climate activist. Their opinions do create a balance that the world needs now
  example_title: example 1
- text: user The fact is that she still doesn’t change her ways and still stays non environmental friendly
  example_title: example 2
- text: user The criteria for these awards dont seem to be very high.
  example_title: example 3

model-index:
- name: StanceBERTa
  results:
  - task:
      type: text-classification
      name: Text Classification         # Optional. Example: Speech Recognition
    dataset:
      type: social media          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
      name: unpublished          # Required. A pretty name for the dataset. Example: Common Voice (French)
    metrics:
      - type: f1          
        value: 77.8
      - type: accuracy          
        value: 78.5   
---

# eevvgg/StanceBERTa

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of **distilroberta-base** model to predict 3 categories of stance (negative, positive, neutral) towards some entity mentioned in the text.
Fine-tuned on a larger and more balanced data sample compared with the previous version [eevvgg/Stance-Tw](https://huggingface.co/eevvgg/Stance-Tw). 


- **Developed by:** Ewelina Gajewska 

- **Model type:** RoBERTa for stance classification
- **Language(s) (NLP):** English social media data from Twitter and Reddit
- **Finetuned from model:** [distilroberta-base](distilroberta-base)


## Uses

```
from transformers import pipeline

model_path = "eevvgg/StanceBERTa"
cls_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)#, device=0 

sequence = ["user The fact is that she still doesn’t change her ways and still stays non environmental friendly"
            "user The criteria for these awards dont seem to be very high."]
            
result = cls_task(sequence)
                                        
```

Model suited for classification of stance in short text. Fine-tuned on a balanced corpus of size 5.6k, partially semi-annotated. 
*Suitable for fine-tuning on hate/offensive language detection.

## Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** training procedure available in [Colab notebook](https://colab.research.google.com/drive/1-C47Ei7vgYtcfLLBB_Vkm3nblE5zH-aL?usp=sharing)
- **Paper :** tba


## Training Details

### Preprocessing 

Normalization of user mentions and hyperlinks to "@user" and "http" tokens, respectively.

### Training Hyperparameters

- trained for 3 epochs, mini-batch size of 8.
- loss: 0.509
- learning_rate: 5e-5; weight_decay: 1e-2

## Evaluation

### Results

- evaluation on 15% of data.

- accuracy: 0.785
- macro avg:
  - f1: 0.778
  - precision: 0.779
  - recall: 0.778
- weighted avg:
  - f1: 0.786
  - precision: 0.786
  - recall: 0.785


## Citation 

**BibTeX:** tba