gal-lardo
/

BERT-RTE-LinearClassifier-v2

@@ -23,11 +23,29 @@ Unlike the standard BERT classification approach, this model implements a custom
 - Uses BERT base model as the encoder for feature extraction
 - Replaces the standard single linear classification head with **multiple linear layers**:
-  - First expansion layer: hidden_size → hidden_size*2
   - Intermediate layer with ReLU activation and dropout
   - Final classification layer
 - Uses label smoothing of 0.1 in the loss function for better generalization
 ## Usage

 - Uses BERT base model as the encoder for feature extraction
 - Replaces the standard single linear classification head with **multiple linear layers**:
+  - First expansion layer: hidden_size → hidden_size
   - Intermediate layer with ReLU activation and dropout
   - Final classification layer
 - Uses label smoothing of 0.1 in the loss function for better generalization
+## Performance
+The model achieves **69.31%** accuracy on the RTE validation set, with the following training dynamics:
+- Best validation accuracy: 69.31% (epoch 4)
+- Final validation accuracy: 69.31% (with early stopping)
+## Hyperparameters
+The model was optimized using Optuna hyperparameter search:
+| Hyperparameter | Value |
+|----------------|-------|
+| Learning rate | 1.304e-05 |
+| Max sequence length | 128 |
+| Dropout rate | 0.1 |
+| Hidden size multiplier | 1 |
+| Batch size | 16 |
+| Training epochs | 4 (early stopping) |
 ## Usage

config.json CHANGED Viewed

@@ -3,13 +3,13 @@
     "BertForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
-  "classifier_dropout": 0.2,
   "custom_params": {
     "batch_size": 16,
-    "hidden_size_multiplier": 2,
-    "learning_rate": 1.7166350301570613e-05,
-    "max_sequence_length": 128,
-    "weight_decay": 0.04
   },
   "gradient_checkpointing": false,
   "hidden_act": "gelu",

     "BertForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": 0.1,
   "custom_params": {
     "batch_size": 16,
+    "dropout_rate": 0.1,
+    "hidden_size_multiplier": 1,
+    "learning_rate": 1.304261063040958e-05,
+    "max_sequence_length": 128
   },
   "gradient_checkpointing": false,
   "hidden_act": "gelu",