---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: https://github.com/AAP9002/COMP34812-NLU-NLI

---

# Model Card for z72819ap-e91802zc-NLI

<!-- Provide a quick summary of what the model is/does. -->

This is a classification model that was trained to detect whether a premise and hypothesis entail each other or not, using binary classification.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This model is based upon a ensemble of RoBERTa models that was fine-tuned using over 24K premise-hypothesis pairs from the shared task dataset for Natural Language Inference (NLI).

- **Developed by:** Alan Prophett and Zac Curtis
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** roberta-base

### Model Resources

<!-- Provide links where applicable. -->

- **Repository:** https://huggingface.co/FacebookAI/roberta-base
- **Paper or documentation:** https://arxiv.org/abs/1907.11692

## Training Details

### Training Data

<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

24K+ premise-hypothesis pairs from the shared task dataset provided for Natural Language Inference (NLI).

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Training Hyperparameters

<!-- This is a summary of the values of hyperparameters used in training the model. -->


    All Models and datasets
      - seed: 42

    Roberta Large NLI Binary Classification Model
      - learning_rate: 2e-05
      - train_batch_size: 16
      - eval_batch_size: 16
      - num_epochs: 5

    Semantic Textual Similarity Binary Classification Model
      - learning_rate: 2e-05
      - train_batch_size: 16
      - eval_batch_size: 16
      - num_epochs: 5

    Ensemble Meta Model
      - learning_rate: 2e-05
      - train_batch_size: 128
      - eval_batch_size: 16
      - num_epochs: 3
      

#### Speeds, Sizes, Times

<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->


      - overall training time: 309 minutes 30 seconds

    Roberta Large NLI Binary Classification Model
      - duration per training epoch: 11 minutes
      - model size: 1.42 GB

    Semantic Textual Similarity Binary Classification Model
      - duration per training epoch: 4 minutes 30 seconds
      - model size: 501 MB

    Ensamble Meta Model
      - duration per training epoch: 4 minutes
      - model size: 1.92 GB

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data & Metrics

#### Testing Data

<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

A subset of the development set provided, amounting to 5.3k+ pairs for validation and 1.3k+ for testing.

#### Metrics

<!-- These are the evaluation metrics being used. -->


      - Precision
      - Recall
      - F1-score
      - Accuracy

### Results


      The Ensemble Model obtained an F1-score of 91% and an accuracy of 91%.

      Validation set
      - Macro Precision: 91.0%
      - Macro Recall: 91.0%
      - Macro F1-score: 91.0%
      - Weighted Precision: 91.0%
      - Weighted Recall: 91.0%
      - Weighted F1-score: 91.0%
      - accuracy: 91.0%
      - Support: 5389

      Test set
      - Macro Precision: 91.0%
      - Macro Recall: 91.0%
      - Macro F1-score: 91.0%
      - Weighted Precision: 91.0%
      - Weighted Recall: 91.0%
      - Weighted F1-score: 91.0%
      - accuracy: 91.0%
      - Support: 1347
      

## Technical Specifications

### Hardware


      - RAM: at least 10 GB
      - Storage: at least 4GB,
      - GPU: a100 40GB

### Software


      - Tensorflow 2.18.0+cu12.4
      - Transformers 4.50.3
      - Pandas 2.2.2
      - NumPy 2.0.2
      - Seaborn 0.13.2
      - Huggingface_hub 0.30.1
      - Matplotlib 3.10.0
      - Scikit-learn 1.6.1

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Any inputs (concatenation of two sequences) longer than
      512 subwords will be truncated by the model.

## Additional Information

<!-- Any other information that would be useful for other people to know. -->

The hyperparameters were determined by experimentation
      with different values.