ptran74
/

DSPFirst-Finetuning-5

@@ -11,97 +11,28 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Important Note:
-`load_best_model_at_end` is not working properly. I created the `combined` metric (55% F1 score + 45% exact match score) to rank the best result but it still does not work. Here is the setting in the `TrainingArguments`:
-```
-  load_best_model_at_end=True,
-  metric_for_best_model='combined',
-  greater_is_better=True,
-```
 # DSPFirst-Finetuning-5
-This model is a fine-tuned version of [ahotrod/electra_large_discriminator_squad2_512](https://huggingface.co/ahotrod/electra_large_discriminator_squad2_512) on a generated Questions and Answers dataset from the DSPFirst textbook based on the SQuAD 2.0 format.<br />
 It achieves the following results on the evaluation set:
-- Loss: 0.9496
-- Exact: 64.0557
-- F1: 70.2957
-- Combined: 67.4877
 ## Model description
 More information needed
-## More accurate metrics:
-### Before fine-tuning:
-```
- 'HasAns_exact': 53.09537088678193,
- 'HasAns_f1': 58.61604504258551,
- 'HasAns_total': 1793,
- 'NoAns_exact': 86.11111111111111,
- 'NoAns_f1': 86.11111111111111,
- 'NoAns_total': 288,
- 'best_exact': 57.66458433445459,
- 'best_exact_thresh': 0.0,
- 'best_f1': 62.42122477720136,
- 'best_f1_thresh': 0.0,
- 'exact': 57.66458433445459,
- 'f1': 62.42122477720133,
- 'total': 2081
-```
-### After fine-tuning:
-```
- 'HasAns_exact': 64.138315672058,
- 'HasAns_f1': 71.25733612355444,
- 'HasAns_total': 1793,
- 'NoAns_exact': 63.19444444444444,
- 'NoAns_f1': 63.19444444444444,
- 'NoAns_total': 288,
- 'best_exact': 63.95963479096588,
- 'best_exact_thresh': 0.0,
- 'best_f1': 70.09341838997268,
- 'best_f1_thresh': 0.0,
- 'exact': 64.00768861124459,
- 'f1': 70.14147221025135,
- 'total': 2081
-```
-# Dataset
-A visualization of the dataset can be found [here](https://github.gatech.edu/pages/VIP-ITS/textbook_SQuAD_explore/explore/textbookv1.0/textbook/).<br />
-The split between train and test is 65% and 35% respectively.
-```
-DatasetDict({
-    train: Dataset({
-        features: ['id', 'title', 'context', 'question', 'answers'],
-        num_rows: 3863
-    })
-    test: Dataset({
-        features: ['id', 'title', 'context', 'question', 'answers'],
-        num_rows: 2081
-    })
-})
-```
 ## Intended uses & limitations
-This model is fine-tuned to answer questions from the DSPFirst textbook. I'm not really sure what I am doing so you should review before using it.<br />
-Also, you should improve the Dataset either by using a **better generated questions and answers model** (currently using https://github.com/patil-suraj/question_generation) or perform **data augmentation** to increase dataset size.
 ## Training and evaluation data
-- `batch_size` of 6 results in 14.82 GB VRAM
-- Utilizes `gradient_accumulation_steps` to get total batch size to 514 (batch size should be at least 256)
-- 4.52 GB RAM
-- 30% of the total questions is dedicated for evaluating.
 ## Training procedure
-- The model was trained from [Google Colab](https://colab.research.google.com/drive/1dJXNstk2NSenwzdtl9xA8AqjP4LL-Ks_?usp=sharing)
-- Utilizes Tesla P100 16GB, took 6.3 hours to train
-- `load_best_model_at_end` is enabled in TrainingArguments
 ### Training hyperparameters
@@ -114,28 +45,20 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 516
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 10
-### Model hyperparameters
-- hidden_dropout_prob: 0.36
-- attention_probs_dropout_prob = 0.36
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Exact   | F1      | Combined |
 |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:--------:|
-| 1.9415        | 0.86  | 20   | 1.1685          | 60.2595 | 68.1892 | 64.6208  |
-| 1.5838        | 1.73  | 40   | 1.0808          | 64.2960 | 71.5666 | 68.2948  |
-| 1.8123        | 2.6   | 60   | 1.1840          | 64.6324 | 72.3829 | 68.8952  |
-| 1.2597        | 3.47  | 80   | 0.9535          | 64.1038 | 70.9803 | 67.8858  |
-| 1.1145        | 4.34  | 100  | 0.8810          | 64.9688 | 71.3201 | 68.4620  |
-| 0.9903        | 5.22  | 120  | 0.9460          | 66.0259 | 72.5939 | 69.6383  |
-| 0.9398        | 6.09  | 140  | 0.8476          | 63.1908 | 69.1036 | 66.4428  |
-| 0.9181        | 6.95  | 160  | 0.9036          | 65.4974 | 71.8701 | 69.0024  |
-| 0.9562        | 7.82  | 180  | 0.9073          | 65.1129 | 71.1841 | 68.4521  |
-| 1.0098        | 8.69  | 200  | 0.9470          | 64.5843 | 70.8046 | 68.0055  |
-| 1.0186        | 9.56  | 220  | 0.9496          | 64.0557 | 70.2957 | 67.4877  |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # DSPFirst-Finetuning-5
+This model is a fine-tuned version of [ahotrod/electra_large_discriminator_squad2_512](https://huggingface.co/ahotrod/electra_large_discriminator_squad2_512) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8529
+- Exact: 66.3117
+- F1: 73.4039
+- Combined: 70.2124
 ## Model description
 More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 ### Training hyperparameters
 - total_train_batch_size: 516
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 7
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Exact   | F1      | Combined |
 |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:--------:|
+| 2.3222        | 0.81  | 20   | 1.0363          | 60.3139 | 68.8586 | 65.0135  |
+| 1.6149        | 1.65  | 40   | 0.9702          | 64.7422 | 72.5555 | 69.0395  |
+| 1.2375        | 2.49  | 60   | 1.0007          | 64.6861 | 72.6306 | 69.0556  |
+| 1.0417        | 3.32  | 80   | 0.9963          | 66.0874 | 73.8634 | 70.3642  |
+| 0.9401        | 4.16  | 100  | 0.8803          | 67.0964 | 74.4842 | 71.1597  |
+| 0.8799        | 4.97  | 120  | 0.8652          | 66.7040 | 74.1267 | 70.7865  |
+| 0.8712        | 5.81  | 140  | 0.8921          | 66.3677 | 73.7213 | 70.4122  |
+| 0.8311        | 6.65  | 160  | 0.8529          | 66.3117 | 73.4039 | 70.2124  |
 ### Framework versions