Devtrick
/

roberta_nli_ensemble

@@ -1,64 +1,154 @@
----
-library_name: transformers
-tags:
-- generated_from_trainer
-metrics:
-- accuracy
-model-index:
-- name: roberta_nli_ensemble
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# roberta_nli_ensemble
-This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4849
-- Accuracy: 0.8848
-- Mcc: 0.7695
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 128
-- eval_batch_size: 128
-- seed: 42
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 10
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Accuracy | Mcc    |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
-| 0.6552        | 1.0   | 191  | 0.3383          | 0.8685   | 0.7377 |
-| 0.2894        | 2.0   | 382  | 0.3045          | 0.8778   | 0.7559 |
-| 0.1891        | 3.0   | 573  | 0.3255          | 0.8854   | 0.7705 |
-| 0.1209        | 4.0   | 764  | 0.3963          | 0.8829   | 0.7657 |
-| 0.0843        | 5.0   | 955  | 0.4849          | 0.8848   | 0.7695 |
-### Framework versions
-- Transformers 4.50.2
-- Pytorch 2.8.0.dev20250326+cu128
-- Datasets 3.5.0
-- Tokenizers 0.21.1

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: roberta_nli_ensemble
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# roberta_nli_ensemble
+<!-- Provide a quick summary of what the model is/does. -->
+A fine-tuned RoBERTa model designed for an Natural Language Inference (NLI) task, classifying the relationship between pairs of sentences given a premise and a hypothesis.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This model builds upon the roberta-base architecture, adding a multi-layer classification head for NLI. It computes average pooled representations of premise and hypothesis tokens (identified via `token_type_ids`) and concatenates them before passing through additional linear and non-linear layers. The final output is used to classify the pair of sentences into one of three classes.
+- **Developed by:** {{ Dev Soneji}}
+- **Language(s):** {{ English}}
+- **Model type:** {{ model_type | default("[More Information Needed]", true)}}
+- **Model architecture:** {{ RoBERTa encoder with a multi-layer classification head}}
+- **Finetuned from model:** {{ roberta-base }}
+### Model Resources
+<!-- Provide links where applicable. -->
+- **Repository:** {{ [Devtrick/roberta_nli_ensemble](https://huggingface.co/Devtrick/roberta_nli_ensemble) }}
+- **Paper or documentation:** {{ [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) }}
+## Training Details
+### Training Data
+<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
+The model was trained on a dataset located in `train.csv`. This dataset comprised of 24K premise-hypothesis pairs, with a label to determine if the hypothesis is true based on the premise. The label was binary, 0 = hypothesis is false, 1 = hypothesis is true. No further details were given on the origin and validity of this dataset.
+The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The model was trained in the following way:
+- The model was trained on the following data ([Training Data](#training-data)), with renaming of columns and tokenization.
+- The model was initialised with a custom configuration class, `roBERTaConfig`, setting essential parameters. The model itself, `roBERTaClassifier` extends the pretrained RoBERTa model to include multiple linear layers for classification and pooling.
+- Hyperparameter selection was carried out in a seperate grid search to identify the best performing hyperparameters. This resulted in the following parameters - [Training Hyperparameters](#training-hyperparameters).
+- The model was validated with the following [test data](#testing-data), giving the following [results](#results).
+- Checkpoints were saved after each epoch, and finally the best checkpoint was reloaded and pushed to the Hugging Face Hub.
+#### Training Hyperparameters
+<!-- This is a summary of the values of hyperparameters used in training the model. -->
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 128
+- eval_batch_size: 128
+- weight_decay: 0.01
+- seed: 42
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 10
+#### Speeds, Sizes, Times
+<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
+- Training time: This model took 12 minutes 17 seconds to train on the hardware specified below. It was trained on 10 epochs, however early stopping caused only 5 epochs to train.
+Model size: 126M parameteres.
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data & Metrics
+#### Testing Data
+<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
+The development (and effectively testing) dataset is located in `dev.csv`. This is 6K pairs as validation data, in the same format of the training data. No further details were given on the origin and validity of this dataset.
+The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
+#### Metrics
+<!-- These are the evaluation metrics being used. -->
+- Accuracy: Proportion of correct predictions.
+- Matthews Correlation Coefficient (MCC): Correlation coefficient between predicted and true labels, ranging from -1 to 1.
+### Results
+Final results on the evaluation set:
+- Loss: 0.4849
+- Accuracy: 0.8848
+- Mcc: 0.7695
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | Mcc    |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
+| 0.6552        | 1.0   | 191  | 0.3383          | 0.8685   | 0.7377 |
+| 0.2894        | 2.0   | 382  | 0.3045          | 0.8778   | 0.7559 |
+| 0.1891        | 3.0   | 573  | 0.3255          | 0.8854   | 0.7705 |
+| 0.1209        | 4.0   | 764  | 0.3963          | 0.8829   | 0.7657 |
+| 0.0843        | 5.0   | 955  | 0.4849          | 0.8848   | 0.7695 |
+## Technical Specifications
+### Hardware
+PC specs the model was trained on:
+- CPU: AMD Ryzen 7 7700X
+- GPU: NVIDIA GeForce RTX 5070 Ti
+- Memory: 32GB DDR5
+- Motherboard: MSI MAG B650 TOMAHAWK WIFI Motherboard
+### Software
+- Transformers 4.50.2
+- Pytorch 2.8.0.dev20250326+cu128
+- Datasets 3.5.0
+- Tokenizers 0.21.1
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+- The model's performance and biases depend on the data on which it was trained, however no details of the data's origin is known so this cannot be commented on.
+- The risk lies in trusting any labelling with confidence, without manual verification. Models can make mistakes, verify the outputs.
+- This is limited by the training data not being comprehensive of all possible premise-hypothesis combinations, however this is possible in real life. Additional training and validation data would have been useful.
+## Additional Information
+<!-- Any other information that would be useful for other people to know. -->
+- This model was pushed to the Hugging Face Hub with `trainer.push_to_hub()` after training locally.