Upload BERT-RTE-LinearClassifier for EEE 486/586 Assignment

56ae850 verified 12 months ago

2.4 kB

	---
	language: en
	license: mit
	datasets:
	- glue/rte
	tags:
	- text-classification
	- glue
	- bert
	- recognizing textual entailment
	- assignment
	metrics:
	- accuracy
	---

	# BERT-RTE Linear Classifier for EEE 486/586 Assignment

	This model is a fine-tuned version of `bert-base-uncased` on the RTE (Recognizing Textual Entailment) task from the GLUE benchmark. It was developed as part of the EEE 486/586 Statistical Foundations of Natural Language Processing course assignment.

	## Model Architecture

	Unlike the standard BERT classification approach, this model implements a custom architecture:

	- Uses BERT base model as the encoder for feature extraction
	- Replaces the standard single linear classification head with multiple linear layers:
	- First expansion layer: hidden_size → hidden_size*2
	- Intermediate layer with ReLU activation and dropout
	- Final classification layer
	- Uses label smoothing of 0.1 in the loss function for better generalization

	## Performance

	The model achieves 70.40% accuracy on the RTE validation set, with the following training dynamics:
	- Best validation accuracy: 70.40% (epoch 3)
	- Final validation accuracy: 69.68% (with early stopping)

	## Hyperparameters

	The model was optimized using Optuna hyperparameter search:

	\| Hyperparameter \| Value \|
	\|----------------\|-------\|
	\| Learning rate \| 1.72e-05 \|
	\| Max sequence length \| 128 \|
	\| Dropout rate \| 0.2 \|
	\| Hidden size multiplier \| 2 \|
	\| Weight decay \| 0.04 \|
	\| Batch size \| 16 \|
	\| Training epochs \| 6 (+2 for final model) \|

	## Usage

	This model can be used for textual entailment classification (determining whether one text logically follows from another):

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("gal-lardo/BERT-RTE-LinearClassifier")
	model = AutoModelForSequenceClassification.from_pretrained("gal-lardo/BERT-RTE-LinearClassifier")

	# Prepare input texts
	premise = "The woman is sleeping on the couch."
	hypothesis = "There is a woman resting."

	# Tokenize and predict
	inputs = tokenizer(premise, hypothesis, return_tensors="pt", padding=True, truncation=True)
	outputs = model(**inputs)
	prediction = outputs.logits.argmax(-1).item()

	# Convert prediction to label
	label = "entailment" if prediction == 1 else "not_entailment"
	print(f"Prediction: {label}")
	```