Model save

Browse files

Files changed (3) hide show

README.md +51 -55
all_results.json +6 -6
train_results.json +6 -6

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M100KInfPrompt_endtoken2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M100KInfPrompt_endtoken2
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6709
 ## Model description
@@ -46,61 +46,57 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 200
 - num_epochs: 50
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch   | Step  | Validation Loss |
-|:-------------:|:-------:|:-----:|:---------------:|
-| 0.8989        | 1.0     | 267   | 0.7585          |
-| 0.7706        | 2.0     | 534   | 0.7264          |
-| 0.7441        | 3.0     | 801   | 0.7160          |
-| 0.7327        | 4.0     | 1068  | 0.7091          |
-| 0.7175        | 5.0     | 1335  | 0.7024          |
-| 0.7118        | 6.0     | 1602  | 0.6990          |
-| 0.7079        | 7.0     | 1869  | 0.6931          |
-| 0.6982        | 8.0     | 2136  | 0.6904          |
-| 0.6977        | 9.0     | 2403  | 0.6891          |
-| 0.6971        | 10.0    | 2670  | 0.6869          |
-| 0.6992        | 11.0    | 2937  | 0.6850          |
-| 0.6889        | 12.0    | 3204  | 0.6849          |
-| 0.6924        | 13.0    | 3471  | 0.6845          |
-| 0.6894        | 14.0    | 3738  | 0.6834          |
-| 0.6886        | 15.0    | 4005  | 0.6791          |
-| 0.6906        | 16.0    | 4272  | 0.6812          |
-| 0.6868        | 17.0    | 4539  | 0.6796          |
-| 0.6852        | 18.0    | 4806  | 0.6789          |
-| 0.6797        | 19.0    | 5073  | 0.6784          |
-| 0.6813        | 20.0    | 5340  | 0.6775          |
-| 0.6823        | 21.0    | 5607  | 0.6776          |
-| 0.6803        | 22.0    | 5874  | 0.6758          |
-| 0.6782        | 23.0    | 6141  | 0.6768          |
-| 0.6786        | 24.0    | 6408  | 0.6747          |
-| 0.677         | 25.0    | 6675  | 0.6740          |
-| 0.68          | 26.0    | 6942  | 0.6742          |
-| 0.6733        | 27.0    | 7209  | 0.6735          |
-| 0.6744        | 28.0    | 7476  | 0.6734          |
-| 0.6746        | 29.0    | 7743  | 0.6737          |
-| 0.674         | 30.0    | 8010  | 0.6753          |
-| 0.6694        | 31.0    | 8277  | 0.6731          |
-| 0.6731        | 32.0    | 8544  | 0.6734          |
-| 0.6683        | 33.0    | 8811  | 0.6723          |
-| 0.6712        | 34.0    | 9078  | 0.6723          |
-| 0.668         | 35.0    | 9345  | 0.6720          |
-| 0.6647        | 36.0    | 9612  | 0.6723          |
-| 0.664         | 37.0    | 9879  | 0.6713          |
-| 0.6707        | 38.0    | 10146 | 0.6724          |
-| 0.6704        | 39.0    | 10413 | 0.6715          |
-| 0.6675        | 40.0    | 10680 | 0.6715          |
-| 0.6673        | 41.0    | 10947 | 0.6718          |
-| 0.6656        | 42.0    | 11214 | 0.6713          |
-| 0.6659        | 43.0    | 11481 | 0.6715          |
-| 0.667         | 44.0    | 11748 | 0.6714          |
-| 0.6596        | 45.0    | 12015 | 0.6709          |
-| 0.6673        | 46.0    | 12282 | 0.6710          |
-| 0.6666        | 47.0    | 12549 | 0.6710          |
-| 0.6661        | 48.0    | 12816 | 0.6709          |
-| 0.6637        | 49.0    | 13083 | 0.6709          |
-| 0.665         | 49.8143 | 13300 | 0.6709          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M10KInfPrompt_endtoken_ls
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M10KInfPrompt_endtoken_ls
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.0494
 ## Model description
 - lr_scheduler_warmup_steps: 200
 - num_epochs: 50
 - mixed_precision_training: Native AMP
+- label_smoothing_factor: 0.1
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 19.0863       | 1.0   | 267   | 2.1942          |
+| 17.6413       | 2.0   | 534   | 2.1318          |
+| 17.3454       | 3.0   | 801   | 2.1143          |
+| 17.2455       | 4.0   | 1068  | 2.0979          |
+| 17.112        | 5.0   | 1335  | 2.0918          |
+| 17.0311       | 6.0   | 1602  | 2.0852          |
+| 16.9714       | 7.0   | 1869  | 2.0805          |
+| 16.8883       | 8.0   | 2136  | 2.0760          |
+| 16.8675       | 9.0   | 2403  | 2.0727          |
+| 16.8491       | 10.0  | 2670  | 2.0699          |
+| 16.8653       | 11.0  | 2937  | 2.0698          |
+| 16.7795       | 12.0  | 3204  | 2.0718          |
+| 16.8033       | 13.0  | 3471  | 2.0635          |
+| 16.7715       | 14.0  | 3738  | 2.0644          |
+| 16.7677       | 15.0  | 4005  | 2.0632          |
+| 16.7682       | 16.0  | 4272  | 2.0615          |
+| 16.7473       | 17.0  | 4539  | 2.0598          |
+| 16.7306       | 18.0  | 4806  | 2.0615          |
+| 16.6896       | 19.0  | 5073  | 2.0586          |
+| 16.7027       | 20.0  | 5340  | 2.0589          |
+| 16.6991       | 21.0  | 5607  | 2.0581          |
+| 16.6864       | 22.0  | 5874  | 2.0573          |
+| 16.6749       | 23.0  | 6141  | 2.0562          |
+| 16.6714       | 24.0  | 6408  | 2.0551          |
+| 16.6603       | 25.0  | 6675  | 2.0546          |
+| 16.6801       | 26.0  | 6942  | 2.0542          |
+| 16.6263       | 27.0  | 7209  | 2.0541          |
+| 16.6436       | 28.0  | 7476  | 2.0531          |
+| 16.6471       | 29.0  | 7743  | 2.0523          |
+| 16.6412       | 30.0  | 8010  | 2.0549          |
+| 16.6017       | 31.0  | 8277  | 2.0529          |
+| 16.6352       | 32.0  | 8544  | 2.0510          |
+| 16.5937       | 33.0  | 8811  | 2.0522          |
+| 16.6165       | 34.0  | 9078  | 2.0511          |
+| 16.5961       | 35.0  | 9345  | 2.0518          |
+| 16.5675       | 36.0  | 9612  | 2.0514          |
+| 16.5565       | 37.0  | 9879  | 2.0499          |
+| 16.6215       | 38.0  | 10146 | 2.0504          |
+| 16.6133       | 39.0  | 10413 | 2.0505          |
+| 16.5901       | 40.0  | 10680 | 2.0492          |
+| 16.5841       | 41.0  | 10947 | 2.0500          |
+| 16.5856       | 42.0  | 11214 | 2.0493          |
+| 16.5775       | 43.0  | 11481 | 2.0494          |
+| 16.5873       | 44.0  | 11748 | 2.0497          |
+| 16.5285       | 45.0  | 12015 | 2.0494          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 49.814258911819884,
     "eval_loss": 0.6708822250366211,
     "eval_runtime": 3.6448,
     "eval_samples_per_second": 501.534,
     "eval_steps_per_second": 62.829,
     "perplexity": 1.9559621584633884,
-    "total_flos": 2.18714101825536e+16,
-    "train_loss": 0.6980596797627614,
-    "train_runtime": 3389.5401,
-    "train_samples_per_second": 125.799,
-    "train_steps_per_second": 3.924
 }

 {
+    "epoch": 45.0,
     "eval_loss": 0.6708822250366211,
     "eval_runtime": 3.6448,
     "eval_samples_per_second": 501.534,
     "eval_steps_per_second": 62.829,
     "perplexity": 1.9559621584633884,
+    "total_flos": 1.975920863064883e+16,
+    "train_loss": 16.87015275197182,
+    "train_runtime": 3210.0048,
+    "train_samples_per_second": 132.835,
+    "train_steps_per_second": 4.143
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 49.814258911819884,
-    "total_flos": 2.18714101825536e+16,
-    "train_loss": 0.6980596797627614,
-    "train_runtime": 3389.5401,
-    "train_samples_per_second": 125.799,
-    "train_steps_per_second": 3.924
 }

 {
+    "epoch": 45.0,
+    "total_flos": 1.975920863064883e+16,
+    "train_loss": 16.87015275197182,
+    "train_runtime": 3210.0048,
+    "train_samples_per_second": 132.835,
+    "train_steps_per_second": 4.143
 }