aap9002's picture
Update README.md
3065026 verified
---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: https://github.com/AAP9002/COMP34812-NLU-NLI
---
# Model Card for z72819ap-e91802zc-NLI
<!-- Provide a quick summary of what the model is/does. -->
This is a classification model that was trained to detect whether a premise and hypothesis entail each other or not, using binary classification.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model is based upon a ensemble of RoBERTa models that was fine-tuned using over 24K premise-hypothesis pairs from the shared task dataset for Natural Language Inference (NLI).
- **Developed by:** Alan Prophett and Zac Curtis
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** roberta-base
### Model Resources
<!-- Provide links where applicable. -->
- **Repository:** https://huggingface.co/FacebookAI/roberta-base
- **Paper or documentation:** https://arxiv.org/abs/1907.11692
## Training Details
### Training Data
<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
24K+ premise-hypothesis pairs from the shared task dataset provided for Natural Language Inference (NLI).
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Training Hyperparameters
<!-- This is a summary of the values of hyperparameters used in training the model. -->
All Models and datasets
- seed: 42
Roberta Large NLI Binary Classification Model
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- num_epochs: 5
Semantic Textual Similarity Binary Classification Model
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- num_epochs: 5
Ensemble Meta Model
- learning_rate: 2e-05
- train_batch_size: 128
- eval_batch_size: 16
- num_epochs: 3
#### Speeds, Sizes, Times
<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
- overall training time: 309 minutes 30 seconds
Roberta Large NLI Binary Classification Model
- duration per training epoch: 11 minutes
- model size: 1.42 GB
Semantic Textual Similarity Binary Classification Model
- duration per training epoch: 4 minutes 30 seconds
- model size: 501 MB
Ensamble Meta Model
- duration per training epoch: 4 minutes
- model size: 1.92 GB
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data & Metrics
#### Testing Data
<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
A subset of the development set provided, amounting to 5.3k+ pairs for validation and 1.3k+ for testing.
#### Metrics
<!-- These are the evaluation metrics being used. -->
- Precision
- Recall
- F1-score
- Accuracy
### Results
The Ensemble Model obtained an F1-score of 91% and an accuracy of 91%.
Validation set
- Macro Precision: 91.0%
- Macro Recall: 91.0%
- Macro F1-score: 91.0%
- Weighted Precision: 91.0%
- Weighted Recall: 91.0%
- Weighted F1-score: 91.0%
- accuracy: 91.0%
- Support: 5389
Test set
- Macro Precision: 91.0%
- Macro Recall: 91.0%
- Macro F1-score: 91.0%
- Weighted Precision: 91.0%
- Weighted Recall: 91.0%
- Weighted F1-score: 91.0%
- accuracy: 91.0%
- Support: 1347
## Technical Specifications
### Hardware
- RAM: at least 10 GB
- Storage: at least 4GB,
- GPU: a100 40GB
### Software
- Tensorflow 2.18.0+cu12.4
- Transformers 4.50.3
- Pandas 2.2.2
- NumPy 2.0.2
- Seaborn 0.13.2
- Huggingface_hub 0.30.1
- Matplotlib 3.10.0
- Scikit-learn 1.6.1
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Any inputs (concatenation of two sequences) longer than
512 subwords will be truncated by the model.
## Additional Information
<!-- Any other information that would be useful for other people to know. -->
The hyperparameters were determined by experimentation
with different values.