| --- |
| language: en |
| license: mit |
| datasets: |
| - glue/rte |
| tags: |
| - text-classification |
| - glue |
| - bert |
| - recognizing textual entailment |
| - assignment |
| metrics: |
| - accuracy |
| --- |
| |
| # BERT-RTE Linear Classifier for EEE 486/586 Assignment |
|
|
| This model is a fine-tuned version of `bert-base-uncased` on the RTE (Recognizing Textual Entailment) task from the GLUE benchmark. It was developed as part of the EEE 486/586 Statistical Foundations of Natural Language Processing course assignment. |
|
|
| ## Model Architecture |
|
|
| Unlike the standard BERT classification approach, this model implements a custom architecture: |
|
|
| - Uses BERT base model as the encoder for feature extraction |
| - Replaces the standard single linear classification head with **multiple linear layers**: |
| - First expansion layer: hidden_size → hidden_size*2 |
| - Intermediate layer with ReLU activation and dropout |
| - Final classification layer |
| - Uses label smoothing of 0.1 in the loss function for better generalization |
| |
| ## Performance |
| |
| The model achieves **70.40%** accuracy on the RTE validation set, with the following training dynamics: |
| - Best validation accuracy: 70.40% (epoch 3) |
| - Final validation accuracy: 69.68% (with early stopping) |
| |
| ## Hyperparameters |
| |
| The model was optimized using Optuna hyperparameter search: |
| |
| | Hyperparameter | Value | |
| |----------------|-------| |
| | Learning rate | 1.72e-05 | |
| | Max sequence length | 128 | |
| | Dropout rate | 0.2 | |
| | Hidden size multiplier | 2 | |
| | Weight decay | 0.04 | |
| | Batch size | 16 | |
| | Training epochs | 6 (+2 for final model) | |
| |
| ## Usage |
| |
| This model can be used for textual entailment classification (determining whether one text logically follows from another): |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| # Load model and tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("gal-lardo/BERT-RTE-LinearClassifier") |
| model = AutoModelForSequenceClassification.from_pretrained("gal-lardo/BERT-RTE-LinearClassifier") |
| |
| # Prepare input texts |
| premise = "The woman is sleeping on the couch." |
| hypothesis = "There is a woman resting." |
| |
| # Tokenize and predict |
| inputs = tokenizer(premise, hypothesis, return_tensors="pt", padding=True, truncation=True) |
| outputs = model(**inputs) |
| prediction = outputs.logits.argmax(-1).item() |
| |
| # Convert prediction to label |
| label = "entailment" if prediction == 1 else "not_entailment" |
| print(f"Prediction: {label}") |
| ``` |
| |