Updated README with note about model update and new performance numbers
Browse files
README.md
CHANGED
|
@@ -12,6 +12,9 @@ tags:
|
|
| 12 |
|
| 13 |
This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
|
| 14 |
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Labels
|
| 17 |
|
|
@@ -86,17 +89,17 @@ The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partne
|
|
| 86 |
### Performance Summary
|
| 87 |
|
| 88 |
- **Merged Dataset**
|
| 89 |
-
- Macro Average F1: **82.36
|
| 90 |
-
- Accuracy: **82.96
|
| 91 |
- **DynaSent R1**
|
| 92 |
-
- Macro Average F1: **85.91
|
| 93 |
-
- Accuracy: **85.83
|
| 94 |
- **DynaSent R2**
|
| 95 |
-
- Macro Average F1: **76.29
|
| 96 |
-
- Accuracy: **76.53
|
| 97 |
- **SST-3**
|
| 98 |
-
- Macro Average F1: **70.90
|
| 99 |
-
- Accuracy: **80.36
|
| 100 |
|
| 101 |
## Model Architecture
|
| 102 |
|
|
@@ -254,7 +257,113 @@ The model's configuration (config.json) includes custom parameters:
|
|
| 254 |
- `dropout_rate`: Dropout rate used in the classifier.
|
| 255 |
- `pooling`: Pooling strategy used ('mean').
|
| 256 |
|
| 257 |
-
## Performance by Dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
|
| 259 |
### Merged Dataset
|
| 260 |
|
|
|
|
| 12 |
|
| 13 |
This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
|
| 14 |
|
| 15 |
+
## Updates
|
| 16 |
+
|
| 17 |
+
- **2025-Mar-25**: Uploaded a better performing model fine-tuned with a different random seed (123 vs. 42) and from an earlier training checkpoint (epoch 10 vs. 13).
|
| 18 |
|
| 19 |
## Labels
|
| 20 |
|
|
|
|
| 89 |
### Performance Summary
|
| 90 |
|
| 91 |
- **Merged Dataset**
|
| 92 |
+
- Macro Average F1: **83.16** (was 82.36)
|
| 93 |
+
- Accuracy: **83.71** (was 82.96)
|
| 94 |
- **DynaSent R1**
|
| 95 |
+
- Macro Average F1: **86.53** (was 85.91)
|
| 96 |
+
- Accuracy: **86.44** (was 85.83)
|
| 97 |
- **DynaSent R2**
|
| 98 |
+
- Macro Average F1: **78.36** (was 76.29)
|
| 99 |
+
- Accuracy: **78.61** (was 76.53)
|
| 100 |
- **SST-3**
|
| 101 |
+
- Macro Average F1: **72.63** (was 70.90)
|
| 102 |
+
- Accuracy: **80.91** (was 80.36)
|
| 103 |
|
| 104 |
## Model Architecture
|
| 105 |
|
|
|
|
| 257 |
- `dropout_rate`: Dropout rate used in the classifier.
|
| 258 |
- `pooling`: Pooling strategy used ('mean').
|
| 259 |
|
| 260 |
+
## Updated Performance by Dataset
|
| 261 |
+
|
| 262 |
+
### Merged Dataset
|
| 263 |
+
|
| 264 |
+
```
|
| 265 |
+
Merged Dataset Classification Report
|
| 266 |
+
|
| 267 |
+
precision recall f1-score support
|
| 268 |
+
|
| 269 |
+
negative 0.874178 0.847789 0.860781 2352
|
| 270 |
+
neutral 0.741715 0.770913 0.756032 1829
|
| 271 |
+
positive 0.878194 0.877820 0.878007 2349
|
| 272 |
+
|
| 273 |
+
accuracy 0.837060 6530
|
| 274 |
+
macro avg 0.831362 0.832174 0.831607 6530
|
| 275 |
+
weighted avg 0.838521 0.837060 0.837639 6530
|
| 276 |
+
|
| 277 |
+
ROC AUC: 0.947808
|
| 278 |
+
|
| 279 |
+
Predicted negative neutral positive
|
| 280 |
+
Actual
|
| 281 |
+
negative 1994 268 90
|
| 282 |
+
neutral 223 1410 196
|
| 283 |
+
positive 64 223 2062
|
| 284 |
+
|
| 285 |
+
Macro F1 Score: 0.83
|
| 286 |
+
```
|
| 287 |
+
|
| 288 |
+
### DynaSent Round 1
|
| 289 |
+
|
| 290 |
+
```
|
| 291 |
+
DynaSent Round 1 Classification Report
|
| 292 |
+
|
| 293 |
+
precision recall f1-score support
|
| 294 |
+
|
| 295 |
+
negative 0.925512 0.828333 0.874230 1200
|
| 296 |
+
neutral 0.781536 0.924167 0.846888 1200
|
| 297 |
+
positive 0.911472 0.840833 0.874729 1200
|
| 298 |
+
|
| 299 |
+
accuracy 0.864444 3600
|
| 300 |
+
macro avg 0.872840 0.864444 0.865283 3600
|
| 301 |
+
weighted avg 0.872840 0.864444 0.865283 3600
|
| 302 |
+
|
| 303 |
+
ROC AUC: 0.962647
|
| 304 |
+
|
| 305 |
+
Predicted negative neutral positive
|
| 306 |
+
Actual
|
| 307 |
+
negative 994 159 47
|
| 308 |
+
neutral 40 1109 51
|
| 309 |
+
positive 40 151 1009
|
| 310 |
+
|
| 311 |
+
Macro F1 Score: 0.87
|
| 312 |
+
```
|
| 313 |
+
|
| 314 |
+
### DynaSent Round 2
|
| 315 |
+
|
| 316 |
+
```
|
| 317 |
+
DynaSent Round 2 Classification Report
|
| 318 |
+
|
| 319 |
+
precision recall f1-score support
|
| 320 |
+
|
| 321 |
+
negative 0.791339 0.837500 0.813765 240
|
| 322 |
+
neutral 0.803030 0.662500 0.726027 240
|
| 323 |
+
positive 0.768657 0.858333 0.811024 240
|
| 324 |
+
|
| 325 |
+
accuracy 0.786111 720
|
| 326 |
+
macro avg 0.787675 0.786111 0.783605 720
|
| 327 |
+
weighted avg 0.787675 0.786111 0.783605 720
|
| 328 |
+
|
| 329 |
+
ROC AUC: 0.932089
|
| 330 |
+
|
| 331 |
+
Predicted negative neutral positive
|
| 332 |
+
Actual
|
| 333 |
+
negative 201 18 21
|
| 334 |
+
neutral 40 159 41
|
| 335 |
+
positive 13 21 206
|
| 336 |
+
|
| 337 |
+
Macro F1 Score: 0.78
|
| 338 |
+
```
|
| 339 |
+
|
| 340 |
+
### Stanford Sentiment Treebank (SST-3)
|
| 341 |
+
|
| 342 |
+
```
|
| 343 |
+
SST-3 Classification Report
|
| 344 |
+
|
| 345 |
+
precision recall f1-score support
|
| 346 |
+
|
| 347 |
+
negative 0.838405 0.876096 0.856836 912
|
| 348 |
+
neutral 0.500000 0.365039 0.421991 389
|
| 349 |
+
positive 0.870504 0.931793 0.900106 909
|
| 350 |
+
|
| 351 |
+
accuracy 0.809050 2210
|
| 352 |
+
macro avg 0.736303 0.724309 0.726311 2210
|
| 353 |
+
weighted avg 0.792042 0.809050 0.798093 2210
|
| 354 |
+
|
| 355 |
+
ROC AUC: 0.905255
|
| 356 |
+
|
| 357 |
+
Predicted negative neutral positive
|
| 358 |
+
Actual
|
| 359 |
+
negative 799 91 22
|
| 360 |
+
neutral 143 142 104
|
| 361 |
+
positive 11 51 847
|
| 362 |
+
|
| 363 |
+
Macro F1 Score: 0.73
|
| 364 |
+
```
|
| 365 |
+
|
| 366 |
+
## Old Performance by Dataset
|
| 367 |
|
| 368 |
### Merged Dataset
|
| 369 |
|