Devtrick commited on
Commit
3406139
·
verified ·
1 Parent(s): 8d0ef6a

Updated READme

Browse files
Files changed (1) hide show
  1. README.md +154 -64
README.md CHANGED
@@ -1,64 +1,154 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - generated_from_trainer
5
- metrics:
6
- - accuracy
7
- model-index:
8
- - name: roberta_nli_ensemble
9
- results: []
10
- ---
11
-
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- # roberta_nli_ensemble
16
-
17
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 0.4849
20
- - Accuracy: 0.8848
21
- - Mcc: 0.7695
22
-
23
- ## Model description
24
-
25
- More information needed
26
-
27
- ## Intended uses & limitations
28
-
29
- More information needed
30
-
31
- ## Training and evaluation data
32
-
33
- More information needed
34
-
35
- ## Training procedure
36
-
37
- ### Training hyperparameters
38
-
39
- The following hyperparameters were used during training:
40
- - learning_rate: 3e-05
41
- - train_batch_size: 128
42
- - eval_batch_size: 128
43
- - seed: 42
44
- - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
- - lr_scheduler_type: linear
46
- - num_epochs: 10
47
-
48
- ### Training results
49
-
50
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | Mcc |
51
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
52
- | 0.6552 | 1.0 | 191 | 0.3383 | 0.8685 | 0.7377 |
53
- | 0.2894 | 2.0 | 382 | 0.3045 | 0.8778 | 0.7559 |
54
- | 0.1891 | 3.0 | 573 | 0.3255 | 0.8854 | 0.7705 |
55
- | 0.1209 | 4.0 | 764 | 0.3963 | 0.8829 | 0.7657 |
56
- | 0.0843 | 5.0 | 955 | 0.4849 | 0.8848 | 0.7695 |
57
-
58
-
59
- ### Framework versions
60
-
61
- - Transformers 4.50.2
62
- - Pytorch 2.8.0.dev20250326+cu128
63
- - Datasets 3.5.0
64
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - accuracy
7
+ model-index:
8
+ - name: roberta_nli_ensemble
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # roberta_nli_ensemble
16
+
17
+ <!-- Provide a quick summary of what the model is/does. -->
18
+
19
+ A fine-tuned RoBERTa model designed for an Natural Language Inference (NLI) task, classifying the relationship between pairs of sentences given a premise and a hypothesis.
20
+
21
+
22
+ ## Model Details
23
+
24
+ ### Model Description
25
+
26
+ <!-- Provide a longer summary of what this model is. -->
27
+
28
+ This model builds upon the roberta-base architecture, adding a multi-layer classification head for NLI. It computes average pooled representations of premise and hypothesis tokens (identified via `token_type_ids`) and concatenates them before passing through additional linear and non-linear layers. The final output is used to classify the pair of sentences into one of three classes.
29
+
30
+ - **Developed by:** {{ Dev Soneji}}
31
+ - **Language(s):** {{ English}}
32
+ - **Model type:** {{ model_type | default("[More Information Needed]", true)}}
33
+ - **Model architecture:** {{ RoBERTa encoder with a multi-layer classification head}}
34
+ - **Finetuned from model:** {{ roberta-base }}
35
+
36
+ ### Model Resources
37
+
38
+ <!-- Provide links where applicable. -->
39
+
40
+ - **Repository:** {{ [Devtrick/roberta_nli_ensemble](https://huggingface.co/Devtrick/roberta_nli_ensemble) }}
41
+ - **Paper or documentation:** {{ [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) }}
42
+
43
+ ## Training Details
44
+
45
+ ### Training Data
46
+
47
+ <!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
48
+
49
+ The model was trained on a dataset located in `train.csv`. This dataset comprised of 24K premise-hypothesis pairs, with a label to determine if the hypothesis is true based on the premise. The label was binary, 0 = hypothesis is false, 1 = hypothesis is true. No further details were given on the origin and validity of this dataset.
50
+
51
+ The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
52
+
53
+ ### Training Procedure
54
+
55
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
56
+
57
+ The model was trained in the following way:
58
+ - The model was trained on the following data ([Training Data](#training-data)), with renaming of columns and tokenization.
59
+ - The model was initialised with a custom configuration class, `roBERTaConfig`, setting essential parameters. The model itself, `roBERTaClassifier` extends the pretrained RoBERTa model to include multiple linear layers for classification and pooling.
60
+ - Hyperparameter selection was carried out in a seperate grid search to identify the best performing hyperparameters. This resulted in the following parameters - [Training Hyperparameters](#training-hyperparameters).
61
+ - The model was validated with the following [test data](#testing-data), giving the following [results](#results).
62
+ - Checkpoints were saved after each epoch, and finally the best checkpoint was reloaded and pushed to the Hugging Face Hub.
63
+
64
+
65
+ #### Training Hyperparameters
66
+
67
+ <!-- This is a summary of the values of hyperparameters used in training the model. -->
68
+
69
+ The following hyperparameters were used during training:
70
+ - learning_rate: 3e-05
71
+ - train_batch_size: 128
72
+ - eval_batch_size: 128
73
+ - weight_decay: 0.01
74
+ - seed: 42
75
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
76
+ - lr_scheduler_type: linear
77
+ - num_epochs: 10
78
+
79
+ #### Speeds, Sizes, Times
80
+
81
+ <!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
82
+
83
+ - Training time: This model took 12 minutes 17 seconds to train on the hardware specified below. It was trained on 10 epochs, however early stopping caused only 5 epochs to train.
84
+
85
+ Model size: 126M parameteres.
86
+
87
+ ## Evaluation
88
+
89
+ <!-- This section describes the evaluation protocols and provides the results. -->
90
+
91
+ ### Testing Data & Metrics
92
+
93
+ #### Testing Data
94
+
95
+ <!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
96
+
97
+ The development (and effectively testing) dataset is located in `dev.csv`. This is 6K pairs as validation data, in the same format of the training data. No further details were given on the origin and validity of this dataset.
98
+
99
+ The data was passed through a tokenizer ([AutoTokenizer](https://huggingface.co/docs/transformers/v4.50.0/en/model_doc/auto#transformers.AutoTokenizer)), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
100
+
101
+ #### Metrics
102
+
103
+ <!-- These are the evaluation metrics being used. -->
104
+
105
+ - Accuracy: Proportion of correct predictions.
106
+ - Matthews Correlation Coefficient (MCC): Correlation coefficient between predicted and true labels, ranging from -1 to 1.
107
+
108
+ ### Results
109
+
110
+ Final results on the evaluation set:
111
+
112
+ - Loss: 0.4849
113
+ - Accuracy: 0.8848
114
+ - Mcc: 0.7695
115
+
116
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Mcc |
117
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
118
+ | 0.6552 | 1.0 | 191 | 0.3383 | 0.8685 | 0.7377 |
119
+ | 0.2894 | 2.0 | 382 | 0.3045 | 0.8778 | 0.7559 |
120
+ | 0.1891 | 3.0 | 573 | 0.3255 | 0.8854 | 0.7705 |
121
+ | 0.1209 | 4.0 | 764 | 0.3963 | 0.8829 | 0.7657 |
122
+ | 0.0843 | 5.0 | 955 | 0.4849 | 0.8848 | 0.7695 |
123
+
124
+ ## Technical Specifications
125
+
126
+ ### Hardware
127
+
128
+ PC specs the model was trained on:
129
+
130
+ - CPU: AMD Ryzen 7 7700X
131
+ - GPU: NVIDIA GeForce RTX 5070 Ti
132
+ - Memory: 32GB DDR5
133
+ - Motherboard: MSI MAG B650 TOMAHAWK WIFI Motherboard
134
+
135
+ ### Software
136
+
137
+ - Transformers 4.50.2
138
+ - Pytorch 2.8.0.dev20250326+cu128
139
+ - Datasets 3.5.0
140
+ - Tokenizers 0.21.1
141
+
142
+ ## Bias, Risks, and Limitations
143
+
144
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
145
+
146
+ - The model's performance and biases depend on the data on which it was trained, however no details of the data's origin is known so this cannot be commented on.
147
+ - The risk lies in trusting any labelling with confidence, without manual verification. Models can make mistakes, verify the outputs.
148
+ - This is limited by the training data not being comprehensive of all possible premise-hypothesis combinations, however this is possible in real life. Additional training and validation data would have been useful.
149
+
150
+ ## Additional Information
151
+
152
+ <!-- Any other information that would be useful for other people to know. -->
153
+
154
+ - This model was pushed to the Hugging Face Hub with `trainer.push_to_hub()` after training locally.