File size: 4,640 Bytes
745352d 3065026 745352d 3065026 745352d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | ---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: https://github.com/AAP9002/COMP34812-NLU-NLI
---
# Model Card for z72819ap-e91802zc-NLI
<!-- Provide a quick summary of what the model is/does. -->
This is a classification model that was trained to detect whether a premise and hypothesis entail each other or not, using binary classification.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model is based upon a ensemble of RoBERTa models that was fine-tuned using over 24K premise-hypothesis pairs from the shared task dataset for Natural Language Inference (NLI).
- **Developed by:** Alan Prophett and Zac Curtis
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** roberta-base
### Model Resources
<!-- Provide links where applicable. -->
- **Repository:** https://huggingface.co/FacebookAI/roberta-base
- **Paper or documentation:** https://arxiv.org/abs/1907.11692
## Training Details
### Training Data
<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
24K+ premise-hypothesis pairs from the shared task dataset provided for Natural Language Inference (NLI).
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Training Hyperparameters
<!-- This is a summary of the values of hyperparameters used in training the model. -->
All Models and datasets
- seed: 42
Roberta Large NLI Binary Classification Model
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- num_epochs: 5
Semantic Textual Similarity Binary Classification Model
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- num_epochs: 5
Ensemble Meta Model
- learning_rate: 2e-05
- train_batch_size: 128
- eval_batch_size: 16
- num_epochs: 3
#### Speeds, Sizes, Times
<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
- overall training time: 309 minutes 30 seconds
Roberta Large NLI Binary Classification Model
- duration per training epoch: 11 minutes
- model size: 1.42 GB
Semantic Textual Similarity Binary Classification Model
- duration per training epoch: 4 minutes 30 seconds
- model size: 501 MB
Ensamble Meta Model
- duration per training epoch: 4 minutes
- model size: 1.92 GB
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data & Metrics
#### Testing Data
<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
A subset of the development set provided, amounting to 5.3k+ pairs for validation and 1.3k+ for testing.
#### Metrics
<!-- These are the evaluation metrics being used. -->
- Precision
- Recall
- F1-score
- Accuracy
### Results
The Ensemble Model obtained an F1-score of 91% and an accuracy of 91%.
Validation set
- Macro Precision: 91.0%
- Macro Recall: 91.0%
- Macro F1-score: 91.0%
- Weighted Precision: 91.0%
- Weighted Recall: 91.0%
- Weighted F1-score: 91.0%
- accuracy: 91.0%
- Support: 5389
Test set
- Macro Precision: 91.0%
- Macro Recall: 91.0%
- Macro F1-score: 91.0%
- Weighted Precision: 91.0%
- Weighted Recall: 91.0%
- Weighted F1-score: 91.0%
- accuracy: 91.0%
- Support: 1347
## Technical Specifications
### Hardware
- RAM: at least 10 GB
- Storage: at least 4GB,
- GPU: a100 40GB
### Software
- Tensorflow 2.18.0+cu12.4
- Transformers 4.50.3
- Pandas 2.2.2
- NumPy 2.0.2
- Seaborn 0.13.2
- Huggingface_hub 0.30.1
- Matplotlib 3.10.0
- Scikit-learn 1.6.1
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Any inputs (concatenation of two sequences) longer than
512 subwords will be truncated by the model.
## Additional Information
<!-- Any other information that would be useful for other people to know. -->
The hyperparameters were determined by experimentation
with different values.
|