File size: 4,192 Bytes
33c611f
58cf855
 
d6f4897
 
 
 
 
 
 
 
 
 
 
 
33c611f
58cf855
d6f4897
58cf855
d6f4897
 
58cf855
d6f4897
 
 
 
58cf855
d6f4897
58cf855
d6f4897
 
 
 
 
 
58cf855
d6f4897
 
58cf855
 
d6f4897
 
 
 
 
 
 
908d17c
d6f4897
 
58cf855
d6f4897
 
 
 
 
 
58cf855
 
 
d6f4897
58cf855
 
 
 
 
 
d6f4897
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58cf855
 
d6f4897
58cf855
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
library_name: peft
base_model: cardiffnlp/twitter-roberta-base-sentiment-latest
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- NHL
- Hockey
- Sports
- roberta
- sentiment analysis
---

# Chelberta

This is a finetuned model of [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) trained on 
5168 sentiment labelled reddit comments from subreddits of NHL hockey teams in December 2023. This model is suitable for English. 

<b>Labels</b>:
0 -> Negative;
1 -> Neutral;
2 -> Positive

This sentiment analysis has been used for the [NHL Positivity Index](https://uais.dev/projects/nhl-positivity-index/)

The full dataset can be found [here](https://www.kaggle.com/datasets/jacobwinch/nhl-reddit-comments)
## Example Pipeline
```python
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel
import torch

model_id = 'cardiffnlp/twitter-roberta-base-sentiment-latest'
peft_model_id = 'UAlbertaUAIS/Chelberta'


model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=3)
tokenizer = AutoTokenizer.from_pretrained(model_id, max_length=512)
model = PeftModel.from_pretrained(model, peft_model_id)
model = model.merge_and_unload()
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer, max_length = 512, truncation=True, device=0)
classifier("Connor McDavid is good at hockey!")
```
```
[{'label': 'positive', 'score': 0.9888942837715149}]
```

- **Developed by:** The Unversity of Alberta Undergraduate Artificial Intelligence Society Student Group
- **Model type:** roberta based
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model [optional]:** [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest)
- **Repository:** https://github.com/UndergraduateArtificialIntelligenceClub/NHL-Positivity-Index

## Uses

Chelberta is inteded to be used to analysis the sentiment of sports fans on social media.




## Evaluation

Chelberta was evaluated on a testing dataset of 1000 human labelled NHL Reddit comments from December 2023, the testing set can be found [here](https://github.com/UndergraduateArtificialIntelligenceClub/NHL-Positivity-Index/blob/main/data/training_data/NHL-SentiComments-1K-TEST.json).
The model had an 81.4% accuracy score. 

### References 
```
@inproceedings{camacho-collados-etal-2022-tweetnlp,
    title = "{T}weet{NLP}: Cutting-Edge Natural Language Processing for Social Media",
    author = "Camacho-collados, Jose  and
      Rezaee, Kiamehr  and
      Riahi, Talayeh  and
      Ushio, Asahi  and
      Loureiro, Daniel  and
      Antypas, Dimosthenis  and
      Boisson, Joanne  and
      Espinosa Anke, Luis  and
      Liu, Fangyu  and
      Mart{\'\i}nez C{\'a}mara, Eugenio" and others,
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-demos.5",
    pages = "38--49"
}

```

```
@inproceedings{loureiro-etal-2022-timelms,
    title = "{T}ime{LM}s: Diachronic Language Models from {T}witter",
    author = "Loureiro, Daniel  and
      Barbieri, Francesco  and
      Neves, Leonardo  and
      Espinosa Anke, Luis  and
      Camacho-collados, Jose",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-demo.25",
    doi = "10.18653/v1/2022.acl-demo.25",
    pages = "251--260"
}

```


## Citation 
**APA:**

Winch, J., Munjal, T., Lau, H., Bradley, A., Monaghan, A., & Subedi, Y. (2023). NHL Positivity Index. Undergraduate Artificial Intelligence Society. https://uais.dev/projects/nhl-positivity-index/



### Framework versions

- PEFT 0.9.0