gal-lardo
/

BERT-RTE-LinearClassifier-v2

@@ -28,25 +28,6 @@ Unlike the standard BERT classification approach, this model implements a custom
   - Final classification layer
 - Uses label smoothing of 0.1 in the loss function for better generalization
-## Performance
-The model achieves **70.40%** accuracy on the RTE validation set, with the following training dynamics:
-- Best validation accuracy: 70.40% (epoch 3)
-- Final validation accuracy: 69.68% (with early stopping)
-## Hyperparameters
-The model was optimized using Optuna hyperparameter search:
-| Hyperparameter | Value |
-|----------------|-------|
-| Learning rate | 1.72e-05 |
-| Max sequence length | 128 |
-| Dropout rate | 0.2 |
-| Hidden size multiplier | 2 |
-| Weight decay | 0.04 |
-| Batch size | 16 |
-| Training epochs | 6 (+2 for final model) |
 ## Usage

   - Final classification layer
 - Uses label smoothing of 0.1 in the loss function for better generalization
 ## Usage