| --- |
| language: en |
| license: mit |
| datasets: |
| - glue/rte |
| tags: |
| - text-classification |
| - glue |
| - bert |
| - recognizing textual entailment |
| - assignment |
| - mean-pooling |
| metrics: |
| - accuracy |
| --- |
| |
| # BERT + Mean Pooling + MLP for RTE (EEE 486/586 Assignment - Part 2) |
|
|
| This model is a fine-tuned version of `bert-base-uncased` on the RTE (Recognizing Textual Entailment) task from the GLUE benchmark. It was developed as part of the EEE 486/586 Statistical Foundations of Natural Language Processing course assignment (Part 2). |
|
|
| ## Model Architecture |
|
|
| This model explores an alternative to the standard `BertForSequenceClassification` architecture: |
|
|
| - Uses the standard `bert-base-uncased` model to obtain token embeddings (`last_hidden_state`). |
| - **Mean Pooling:** Instead of using the [CLS] token's pooler output, it calculates the mean of the `last_hidden_state` across all non-padding tokens (using the attention mask) to get a single sequence representation vector. |
| - **MLP Classifier Head:** The mean-pooled representation is passed through dropout and then a multi-layer perceptron (MLP) head for classification. The MLP structure was determined by the hyperparameter search (`hidden_size_multiplier=4`). |
| - The final layer outputs logits for the 2 classes (entailment/not\_entailment). |
| |
| **Note:** Because this uses a custom architecture (`BertMeanPoolClassifier`), it cannot be loaded directly using `AutoModelForSequenceClassification.from_pretrained()`. You need the model's class definition (provided in the assignment code/report) and then load the `state_dict` (`pytorch_model.bin`) into an instance of that class. |
|
|
| ## Performance |
|
|
| The model was trained using hyperparameters found via Optuna. The final training run (5 epochs with early stopping based on validation accuracy) achieved the following: |
|
|
| - **Best Validation Accuracy:** **0.6931** (achieved at Epoch 3) |
| - Final Validation Accuracy (Epoch 5): 0.6823 |
| - Final Validation Loss (Epoch 5): 1.4258 |
| - Final Training Loss (Epoch 5): 0.0797 |
|
|
| The model showed strong fitting capabilities but exhibited signs of overfitting after epoch 3, as indicated by the rising validation loss. The best checkpoint based on accuracy was saved. |
|
|
| ## Best Hyperparameters (from Optuna) |
|
|
| | Hyperparameter | Value | |
| |--------------------------|-----------------------| |
| | Learning Rate | 3.518e-05 | |
| | Max Sequence Length | 128 | |
| | Dropout Rate (Classifier)| 0.4 | |
| | Batch Size | 16 | |
| | Hidden Size Multiplier | 4 | |
| | Epochs (Optuna Best Trial) | 3 | |
|
|
| ## Intended Use & Limitations |
|
|
| This model is intended for the RTE task as part of the specific course assignment. Due to its custom architecture, direct loading via `AutoModelForSequenceClassification` is not supported. |
| ``` |
| |