Training in progress, epoch 1

Browse files

Files changed (10) hide show

README.md +32 -57
adapter_model.safetensors +2 -2
added_tokens.json +3 -2
all_results.json +11 -11
eval_results.json +6 -6
special_tokens_map.json +1 -14
tokenizer.json +12 -3
tokenizer_config.json +13 -6
train_results.json +6 -6
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M100KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M100KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6957
 ## Model description
@@ -35,12 +35,13 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
 - num_epochs: 50
 - mixed_precision_training: Native AMP
@@ -48,56 +49,30 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 0.2129        | 1.0   | 1430  | 0.8043          |
-| 0.2018        | 2.0   | 2860  | 0.7705          |
-| 0.1949        | 3.0   | 4290  | 0.7588          |
-| 0.1913        | 4.0   | 5720  | 0.7498          |
-| 0.1921        | 5.0   | 7150  | 0.7435          |
-| 0.1903        | 6.0   | 8580  | 0.7371          |
-| 0.1888        | 7.0   | 10010 | 0.7339          |
-| 0.1881        | 8.0   | 11440 | 0.7299          |
-| 0.1872        | 9.0   | 12870 | 0.7267          |
-| 0.187         | 10.0  | 14300 | 0.7251          |
-| 0.184         | 11.0  | 15730 | 0.7229          |
-| 0.1846        | 12.0  | 17160 | 0.7212          |
-| 0.1851        | 13.0  | 18590 | 0.7182          |
-| 0.1804        | 14.0  | 20020 | 0.7153          |
-| 0.1848        | 15.0  | 21450 | 0.7141          |
-| 0.1824        | 16.0  | 22880 | 0.7144          |
-| 0.1796        | 17.0  | 24310 | 0.7116          |
-| 0.18          | 18.0  | 25740 | 0.7108          |
-| 0.1825        | 19.0  | 27170 | 0.7082          |
-| 0.1852        | 20.0  | 28600 | 0.7082          |
-| 0.1785        | 21.0  | 30030 | 0.7072          |
-| 0.1811        | 22.0  | 31460 | 0.7057          |
-| 0.178         | 23.0  | 32890 | 0.7059          |
-| 0.1827        | 24.0  | 34320 | 0.7046          |
-| 0.1813        | 25.0  | 35750 | 0.7033          |
-| 0.1825        | 26.0  | 37180 | 0.7039          |
-| 0.1795        | 27.0  | 38610 | 0.7032          |
-| 0.1801        | 28.0  | 40040 | 0.7017          |
-| 0.1781        | 29.0  | 41470 | 0.7013          |
-| 0.1823        | 30.0  | 42900 | 0.7010          |
-| 0.1781        | 31.0  | 44330 | 0.7012          |
-| 0.1809        | 32.0  | 45760 | 0.6999          |
-| 0.1764        | 33.0  | 47190 | 0.6996          |
-| 0.1791        | 34.0  | 48620 | 0.6983          |
-| 0.1793        | 35.0  | 50050 | 0.6988          |
-| 0.1785        | 36.0  | 51480 | 0.6980          |
-| 0.1777        | 37.0  | 52910 | 0.6980          |
-| 0.1774        | 38.0  | 54340 | 0.6980          |
-| 0.1795        | 39.0  | 55770 | 0.6976          |
-| 0.1772        | 40.0  | 57200 | 0.6974          |
-| 0.1793        | 41.0  | 58630 | 0.6974          |
-| 0.1777        | 42.0  | 60060 | 0.6968          |
-| 0.1777        | 43.0  | 61490 | 0.6965          |
-| 0.1779        | 44.0  | 62920 | 0.6965          |
-| 0.1782        | 45.0  | 64350 | 0.6964          |
-| 0.1765        | 46.0  | 65780 | 0.6961          |
-| 0.1758        | 47.0  | 67210 | 0.6962          |
-| 0.1763        | 48.0  | 68640 | 0.6960          |
-| 0.1788        | 49.0  | 70070 | 0.6958          |
-| 0.1776        | 50.0  | 71500 | 0.6957          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M10KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M10KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6872
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 200
 - num_epochs: 50
 - mixed_precision_training: Native AMP
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 0.8085        | 1.0   | 610   | 0.7760          |
+| 0.7801        | 2.0   | 1220  | 0.7436          |
+| 0.7608        | 3.0   | 1830  | 0.7269          |
+| 0.7438        | 4.0   | 2440  | 0.7199          |
+| 0.7413        | 5.0   | 3050  | 0.7118          |
+| 0.7343        | 6.0   | 3660  | 0.7121          |
+| 0.7332        | 7.0   | 4270  | 0.7089          |
+| 0.7319        | 8.0   | 4880  | 0.7025          |
+| 0.7289        | 9.0   | 5490  | 0.7001          |
+| 0.7236        | 10.0  | 6100  | 0.6965          |
+| 0.7147        | 11.0  | 6710  | 0.6970          |
+| 0.7126        | 12.0  | 7320  | 0.6973          |
+| 0.7167        | 13.0  | 7930  | 0.6935          |
+| 0.711         | 14.0  | 8540  | 0.6927          |
+| 0.7057        | 15.0  | 9150  | 0.6940          |
+| 0.7109        | 16.0  | 9760  | 0.6924          |
+| 0.7117        | 17.0  | 10370 | 0.6928          |
+| 0.7086        | 18.0  | 10980 | 0.6882          |
+| 0.7004        | 19.0  | 11590 | 0.6872          |
+| 0.7016        | 20.0  | 12200 | 0.6895          |
+| 0.7027        | 21.0  | 12810 | 0.6884          |
+| 0.6928        | 22.0  | 13420 | 0.6885          |
+| 0.7059        | 23.0  | 14030 | 0.6894          |
+| 0.6916        | 24.0  | 14640 | 0.6875          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7c8c43647d357635b454fd82af24fa47aad56af1560c4890ba23b7f6187fa3f6
-size 309974336

 version https://git-lfs.github.com/spec/v1
+oid sha256:118e5a3133e42214d1f7eeb3e731adb33f4fe5afa73d16c1bb82263e1c0a682a
+size 309980480

added_tokens.json CHANGED Viewed

@@ -1,4 +1,5 @@
 {
-  "<endofex>": 50258,
-  "<startofex>": 50257
 }

 {
+  "<endofex>": 50257,
+  "<pad>": 50258,
+  "<startofex>": 50259
 }

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 50.0,
-    "eval_loss": 0.6956692934036255,
-    "eval_runtime": 59.8322,
-    "eval_samples_per_second": 163.257,
-    "eval_steps_per_second": 5.114,
-    "perplexity": 2.005050592091695,
-    "total_flos": 1.49878932701184e+17,
-    "train_loss": 0.18437957987751993,
-    "train_runtime": 7319.2549,
-    "train_samples_per_second": 312.395,
-    "train_steps_per_second": 9.769
 }

 {
+    "epoch": 24.0,
+    "eval_loss": 0.6872262954711914,
+    "eval_runtime": 1.8886,
+    "eval_samples_per_second": 559.678,
+    "eval_steps_per_second": 70.423,
+    "perplexity": 1.9881932176157675,
+    "total_flos": 7672437924691968.0,
+    "train_loss": 0.7372308338926138,
+    "train_runtime": 572.2889,
+    "train_samples_per_second": 426.096,
+    "train_steps_per_second": 53.295
 }

eval_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 50.0,
-    "eval_loss": 0.6956692934036255,
-    "eval_runtime": 59.8322,
-    "eval_samples_per_second": 163.257,
-    "eval_steps_per_second": 5.114,
-    "perplexity": 2.005050592091695
 }

 {
+    "epoch": 24.0,
+    "eval_loss": 0.6872262954711914,
+    "eval_runtime": 1.8886,
+    "eval_samples_per_second": 559.678,
+    "eval_steps_per_second": 70.423,
+    "perplexity": 1.9881932176157675
 }

special_tokens_map.json CHANGED Viewed

@@ -6,23 +6,10 @@
       "normalized": false,
       "rstrip": false,
       "single_word": false
-    },
-    {
-      "content": "<endofex>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false
     }
   ],
   "bos_token": "<|endoftext|>",
   "eos_token": "<endofex>",
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
   "unk_token": "<|endoftext|>"
 }

       "normalized": false,
       "rstrip": false,
       "single_word": false
     }
   ],
   "bos_token": "<|endoftext|>",
   "eos_token": "<endofex>",
+  "pad_token": "<pad>",
   "unk_token": "<|endoftext|>"
 }

tokenizer.json CHANGED Viewed

@@ -14,12 +14,12 @@
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
-      "normalized": false,
       "special": true
     },
     {
       "id": 50257,
-      "content": "<startofex>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
@@ -28,7 +28,16 @@
     },
     {
       "id": 50258,
-      "content": "<endofex>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,

       "single_word": false,
       "lstrip": false,
       "rstrip": false,
+      "normalized": true,
       "special": true
     },
     {
       "id": 50257,
+      "content": "<endofex>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
     },
     {
       "id": 50258,
+      "content": "<pad>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 50259,
+      "content": "<startofex>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,

tokenizer_config.json CHANGED Viewed

@@ -4,13 +4,13 @@
     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
-      "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
     "50257": {
-      "content": "<startofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
@@ -18,7 +18,15 @@
       "special": true
     },
     "50258": {
-      "content": "<endofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
@@ -27,15 +35,14 @@
     }
   },
   "additional_special_tokens": [
-    "<startofex>",
-    "<endofex>"
   ],
   "bos_token": "<|endoftext|>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<endofex>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
-  "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
+      "normalized": true,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
     "50257": {
+      "content": "<endofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "special": true
     },
     "50258": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50259": {
+      "content": "<startofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
     }
   },
   "additional_special_tokens": [
+    "<startofex>"
   ],
   "bos_token": "<|endoftext|>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<endofex>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
+  "pad_token": "<pad>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 50.0,
-    "total_flos": 1.49878932701184e+17,
-    "train_loss": 0.18437957987751993,
-    "train_runtime": 7319.2549,
-    "train_samples_per_second": 312.395,
-    "train_steps_per_second": 9.769
 }

 {
+    "epoch": 24.0,
+    "total_flos": 7672437924691968.0,
+    "train_loss": 0.7372308338926138,
+    "train_runtime": 572.2889,
+    "train_samples_per_second": 426.096,
+    "train_steps_per_second": 53.295
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:59bd0190032b8576fa67d50e424c4c48cd36048fc82231e1db1117e678eb5423
 size 5432

 version https://git-lfs.github.com/spec/v1
+oid sha256:03155aa7943de7470f4e3e03615d815edc1696005426d2517ae93b9b94930139
 size 5432