jbeno
/

electra-large-classifier-sentiment

@@ -12,6 +12,9 @@ tags:
 This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
 ## Labels
@@ -86,17 +89,17 @@ The research paper can be found here: [ELECTRA and GPT-4o: Cost-Effective Partne
 ### Performance Summary
 - **Merged Dataset**
-    - Macro Average F1: **82.36**
-    - Accuracy: **82.96**
 - **DynaSent R1**
-    - Macro Average F1: **85.91**
-    - Accuracy: **85.83**
 - **DynaSent R2**
-    - Macro Average F1: **76.29**
-    - Accuracy: **76.53**
 - **SST-3**
-    - Macro Average F1: **70.90**
-    - Accuracy: **80.36**
 ## Model Architecture
@@ -254,7 +257,113 @@ The model's configuration (config.json) includes custom parameters:
 - `dropout_rate`: Dropout rate used in the classifier.
 - `pooling`: Pooling strategy used ('mean').
-## Performance by Dataset
 ### Merged Dataset

 This is an [ELECTRA large discriminator](https://huggingface.co/google/electra-large-discriminator) fine-tuned for sentiment analysis of reviews. It has a mean pooling layer and a classifier head (2 layers of 1024 dimension) with SwishGLU activation and dropout (0.3). It classifies text into three sentiment categories: 'negative' (0), 'neutral' (1), and 'positive' (2). It was fine-tuned on the [Sentiment Merged](https://huggingface.co/datasets/jbeno/sentiment_merged) dataset, which is a merge of Stanford Sentiment Treebank (SST-3), and DynaSent Rounds 1 and 2.
+## Updates
+- **2025-Mar-25**: Uploaded a better performing model fine-tuned with a different random seed (123 vs. 42) and from an earlier training checkpoint (epoch 10 vs. 13).
 ## Labels
 ### Performance Summary
 - **Merged Dataset**
+    - Macro Average F1: **83.16** (was 82.36)
+    - Accuracy: **83.71** (was 82.96)
 - **DynaSent R1**
+    - Macro Average F1: **86.53** (was 85.91)
+    - Accuracy: **86.44** (was 85.83)
 - **DynaSent R2**
+    - Macro Average F1: **78.36** (was 76.29)
+    - Accuracy: **78.61** (was 76.53)
 - **SST-3**
+    - Macro Average F1: **72.63** (was 70.90)
+    - Accuracy: **80.91** (was 80.36)
 ## Model Architecture
 - `dropout_rate`: Dropout rate used in the classifier.
 - `pooling`: Pooling strategy used ('mean').
+## Updated Performance by Dataset
+### Merged Dataset
+```
+Merged Dataset Classification Report
+              precision    recall  f1-score   support
+    negative   0.874178  0.847789  0.860781      2352
+     neutral   0.741715  0.770913  0.756032      1829
+    positive   0.878194  0.877820  0.878007      2349
+    accuracy                       0.837060      6530
+   macro avg   0.831362  0.832174  0.831607      6530
+weighted avg   0.838521  0.837060  0.837639      6530
+ROC AUC: 0.947808
+Predicted  negative  neutral  positive
+Actual
+negative       1994      268        90
+neutral         223     1410       196
+positive         64      223      2062
+Macro F1 Score: 0.83
+```
+### DynaSent Round 1
+```
+DynaSent Round 1 Classification Report
+              precision    recall  f1-score   support
+    negative   0.925512  0.828333  0.874230      1200
+     neutral   0.781536  0.924167  0.846888      1200
+    positive   0.911472  0.840833  0.874729      1200
+    accuracy                       0.864444      3600
+   macro avg   0.872840  0.864444  0.865283      3600
+weighted avg   0.872840  0.864444  0.865283      3600
+ROC AUC: 0.962647
+Predicted  negative  neutral  positive
+Actual
+negative        994      159        47
+neutral          40     1109        51
+positive         40      151      1009
+Macro F1 Score: 0.87
+```
+### DynaSent Round 2
+```
+DynaSent Round 2 Classification Report
+              precision    recall  f1-score   support
+    negative   0.791339  0.837500  0.813765       240
+     neutral   0.803030  0.662500  0.726027       240
+    positive   0.768657  0.858333  0.811024       240
+    accuracy                       0.786111       720
+   macro avg   0.787675  0.786111  0.783605       720
+weighted avg   0.787675  0.786111  0.783605       720
+ROC AUC: 0.932089
+Predicted  negative  neutral  positive
+Actual
+negative        201       18        21
+neutral          40      159        41
+positive         13       21       206
+Macro F1 Score: 0.78
+```
+### Stanford Sentiment Treebank (SST-3)
+```
+SST-3 Classification Report
+              precision    recall  f1-score   support
+    negative   0.838405  0.876096  0.856836       912
+     neutral   0.500000  0.365039  0.421991       389
+    positive   0.870504  0.931793  0.900106       909
+    accuracy                       0.809050      2210
+   macro avg   0.736303  0.724309  0.726311      2210
+weighted avg   0.792042  0.809050  0.798093      2210
+ROC AUC: 0.905255
+Predicted  negative  neutral  positive
+Actual
+negative        799       91        22
+neutral         143      142       104
+positive         11       51       847
+Macro F1 Score: 0.73
+```
+## Old Performance by Dataset
 ### Merged Dataset