File size: 4,644 Bytes
75a3902 d008c8c 36511d4 d008c8c 75a3902 d008c8c 75a3902 7d6ea93 b0430ad 75a3902 d008c8c 75a3902 d008c8c 75a3902 d008c8c 75a3902 d008c8c 0b2aacc d008c8c 75a3902 d008c8c 75a3902 7d6ea93 f3ae04b d605fcb 75a3902 7d6ea93 75a3902 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
tags:
- generated_from_trainer
metrics:
- f1
model-index:
- name: DSPFirst-Finetuning-5
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Important Note:
I created the `combined` metric (55% F1 score + 45% exact match score) and load the state with the best result at the end. Here is the setting in the `TrainingArguments`:
```
load_best_model_at_end=True,
metric_for_best_model='combined',
greater_is_better=True,
```
# DSPFirst-Finetuning-5
This model is a fine-tuned version of [ahotrod/electra_large_discriminator_squad2_512](https://huggingface.co/ahotrod/electra_large_discriminator_squad2_512) on a generated Questions and Answers dataset from the DSPFirst textbook based on the SQuAD 2.0 format.<br />
It achieves the following results on the evaluation set:
- Loss: 0.8529
- Exact: 67.0964
- F1: 74.4842
- Combined: 71.1597
## More accurate metrics:
### Before fine-tuning:
```
'HasAns_exact': 54.71817606079797,
'HasAns_f1': 61.08672724332754,
'HasAns_total': 1579,
'NoAns_exact': 88.78048780487805,
'NoAns_f1': 88.78048780487805,
'NoAns_total': 205,
'best_exact': 58.63228699551569,
'best_exact_thresh': 0.0,
'best_f1': 64.26902596256402,
'best_f1_thresh': 0.0,
'exact': 58.63228699551569,
'f1': 64.26902596256404,
'total': 1784
```
### After fine-tuning:
```
'HasAns_exact': 67.57441418619379,
'HasAns_f1': 75.92137683558988,
'HasAns_total': 1579,
'NoAns_exact': 63.41463414634146,
'NoAns_f1': 63.41463414634146,
'NoAns_total': 205,
'best_exact': 67.0964125560538,
'best_exact_thresh': 0.0,
'best_f1': 74.48422310728503,
'best_f1_thresh': 0.0,
'exact': 67.0964125560538,
'f1': 74.48422310728503,
'total': 1784
```
# Dataset
A visualization of the dataset can be found [here](https://github.gatech.edu/pages/VIP-ITS/textbook_SQuAD_explore/explore/textbookv1.0/textbook/).<br />
The split between train and test is 70% and 30% respectively.
```
DatasetDict({
train: Dataset({
features: ['id', 'title', 'context', 'question', 'answers'],
num_rows: 4160
})
test: Dataset({
features: ['id', 'title', 'context', 'question', 'answers'],
num_rows: 1784
})
})
```
## Intended uses & limitations
This model is fine-tuned to answer questions from the DSPFirst textbook. I'm not really sure what I am doing so you should review before using it.<br />
Also, you should improve the Dataset either by using a **better generated questions and answers model** (currently using https://github.com/patil-suraj/question_generation) or perform **data augmentation** to increase dataset size.
## Training and evaluation data
- `batch_size` of 6 results in 14.03 GB VRAM
- Utilizes `gradient_accumulation_steps` to get total batch size to 516 (total batch size should be at least 256)
- 4.52 GB RAM
- 30% of the total questions is dedicated for evaluating.
## Training procedure
- The model was trained from [Google Colab](https://colab.research.google.com/drive/1dJXNstk2NSenwzdtl9xA8AqjP4LL-Ks_?usp=sharing)
- Utilizes Tesla P100 16GB, took 6.3 hours to train
- `load_best_model_at_end` is enabled in TrainingArguments
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 6
- eval_batch_size: 6
- seed: 42
- gradient_accumulation_steps: 86
- total_train_batch_size: 516
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 7
### Model hyperparameters
- hidden_dropout_prob: 0.36
- attention_probs_dropout_prob = 0.36
### Training results
| Training Loss | Epoch | Step | Validation Loss | Exact | F1 | Combined |
|:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:--------:|
| 2.3222 | 0.81 | 20 | 1.0363 | 60.3139 | 68.8586 | 65.0135 |
| 1.6149 | 1.65 | 40 | 0.9702 | 64.7422 | 72.5555 | 69.0395 |
| 1.2375 | 2.49 | 60 | 1.0007 | 64.6861 | 72.6306 | 69.0556 |
| 1.0417 | 3.32 | 80 | 0.9963 | 66.0874 | 73.8634 | 70.3642 |
| 0.9401 | 4.16 | 100 | 0.8803 | 67.0964 | 74.4842 | 71.1597 |
| 0.8799 | 4.97 | 120 | 0.8652 | 66.7040 | 74.1267 | 70.7865 |
| 0.8712 | 5.81 | 140 | 0.8921 | 66.3677 | 73.7213 | 70.4122 |
| 0.8311 | 6.65 | 160 | 0.8529 | 66.3117 | 73.4039 | 70.2124 |
### Framework versions
- Transformers 4.18.0
- Pytorch 1.10.0+cu111
- Datasets 2.1.0
- Tokenizers 0.12.1
|