NLI-Transformer-Ensemble-Model / README.md

Update README.md

3065026 verified 10 months ago

4.64 kB

	---
	{}
	---
	language: en
	license: cc-by-4.0
	tags:
	- text-classification
	repo: https://github.com/AAP9002/COMP34812-NLU-NLI

	---

	# Model Card for z72819ap-e91802zc-NLI

	<!-- Provide a quick summary of what the model is/does. -->

	This is a classification model that was trained to detect whether a premise and hypothesis entail each other or not, using binary classification.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model is based upon a ensemble of RoBERTa models that was fine-tuned using over 24K premise-hypothesis pairs from the shared task dataset for Natural Language Inference (NLI).

	- Developed by: Alan Prophett and Zac Curtis
	- Language(s): English
	- Model type: Supervised
	- Model architecture: Transformers
	- Finetuned from model [optional]: roberta-base

	### Model Resources

	<!-- Provide links where applicable. -->

	- Repository: https://huggingface.co/FacebookAI/roberta-base
	- Paper or documentation: https://arxiv.org/abs/1907.11692

	## Training Details

	### Training Data

	<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

	24K+ premise-hypothesis pairs from the shared task dataset provided for Natural Language Inference (NLI).

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Training Hyperparameters

	<!-- This is a summary of the values of hyperparameters used in training the model. -->


	All Models and datasets
	- seed: 42

	Roberta Large NLI Binary Classification Model
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- num_epochs: 5

	Semantic Textual Similarity Binary Classification Model
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- num_epochs: 5

	Ensemble Meta Model
	- learning_rate: 2e-05
	- train_batch_size: 128
	- eval_batch_size: 16
	- num_epochs: 3


	#### Speeds, Sizes, Times

	<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->


	- overall training time: 309 minutes 30 seconds

	Roberta Large NLI Binary Classification Model
	- duration per training epoch: 11 minutes
	- model size: 1.42 GB

	Semantic Textual Similarity Binary Classification Model
	- duration per training epoch: 4 minutes 30 seconds
	- model size: 501 MB

	Ensamble Meta Model
	- duration per training epoch: 4 minutes
	- model size: 1.92 GB

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data & Metrics

	#### Testing Data

	<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

	A subset of the development set provided, amounting to 5.3k+ pairs for validation and 1.3k+ for testing.

	#### Metrics

	<!-- These are the evaluation metrics being used. -->


	- Precision
	- Recall
	- F1-score
	- Accuracy

	### Results


	The Ensemble Model obtained an F1-score of 91% and an accuracy of 91%.

	Validation set
	- Macro Precision: 91.0%
	- Macro Recall: 91.0%
	- Macro F1-score: 91.0%
	- Weighted Precision: 91.0%
	- Weighted Recall: 91.0%
	- Weighted F1-score: 91.0%
	- accuracy: 91.0%
	- Support: 5389

	Test set
	- Macro Precision: 91.0%
	- Macro Recall: 91.0%
	- Macro F1-score: 91.0%
	- Weighted Precision: 91.0%
	- Weighted Recall: 91.0%
	- Weighted F1-score: 91.0%
	- accuracy: 91.0%
	- Support: 1347


	## Technical Specifications

	### Hardware


	- RAM: at least 10 GB
	- Storage: at least 4GB,
	- GPU: a100 40GB

	### Software


	- Tensorflow 2.18.0+cu12.4
	- Transformers 4.50.3
	- Pandas 2.2.2
	- NumPy 2.0.2
	- Seaborn 0.13.2
	- Huggingface_hub 0.30.1
	- Matplotlib 3.10.0
	- Scikit-learn 1.6.1

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	Any inputs (concatenation of two sequences) longer than
	512 subwords will be truncated by the model.

	## Additional Information

	<!-- Any other information that would be useful for other people to know. -->

	The hyperparameters were determined by experimentation
	with different values.