Commit
·
32b5747
1
Parent(s):
b4255f4
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,62 @@
|
|
| 1 |
---
|
| 2 |
license: openrail
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: openrail
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts
|
| 6 |
+
|
| 7 |
+
This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis).
|
| 8 |
+
|
| 9 |
+
## Example of Pipeline
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
```python
|
| 13 |
+
from transformers import pipeline
|
| 14 |
+
|
| 15 |
+
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
|
| 16 |
+
sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
|
| 17 |
+
sentiment_task(["Excellent, happy to help!",
|
| 18 |
+
"This can probably be done using JavaScript.",
|
| 19 |
+
"Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
[{'label': 'positive', 'score': 0.9997847676277161},
|
| 23 |
+
{'label': 'neutral', 'score': 0.999783456325531},
|
| 24 |
+
{'label': 'negative', 'score': 0.9996368885040283}]
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
## Example of Classification
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from scipy.special import softmax
|
| 33 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 34 |
+
|
| 35 |
+
def preprocess(text):
|
| 36 |
+
"""Preprocess text (username and link placeholders)"""
|
| 37 |
+
new_text = []
|
| 38 |
+
for t in text.split(' '):
|
| 39 |
+
t = '@user' if t.startswith('@') and len(t) > 1 else t
|
| 40 |
+
t = 'http' if t.startswith('http') else t
|
| 41 |
+
new_text.append(t)
|
| 42 |
+
return ' '.join(new_text).strip()
|
| 43 |
+
|
| 44 |
+
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
|
| 45 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL)
|
| 46 |
+
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
|
| 47 |
+
|
| 48 |
+
text = "Excellent, happy to help!"
|
| 49 |
+
text = preprocess(text)
|
| 50 |
+
encoded_input = tokenizer(text, return_tensors='pt')
|
| 51 |
+
output = model(**encoded_input)
|
| 52 |
+
scores = output[0][0].detach().numpy()
|
| 53 |
+
scores = softmax(scores)
|
| 54 |
+
print("negative", scores[0])
|
| 55 |
+
print("neutral", scores[1])
|
| 56 |
+
print("positive", scores[2])
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
negative 0.00015578205
|
| 60 |
+
neutral 5.9470447e-05
|
| 61 |
+
positive 0.99978495
|
| 62 |
+
|