Training in progress, epoch 1

Browse files

Files changed (10) hide show

README.md +58 -36
adapter_model.safetensors +2 -2
added_tokens.json +3 -2
all_results.json +11 -11
eval_results.json +6 -6
special_tokens_map.json +3 -9
tokenizer.json +12 -3
tokenizer_config.json +14 -6
train_results.json +6 -6
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M10KInfPrompt
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M10KInfPrompt
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4227
 ## Model description
@@ -35,46 +35,68 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
 - num_epochs: 50
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 0.5306        | 1.0   | 169  | 0.9535          |
-| 0.2161        | 2.0   | 338  | 0.6077          |
-| 0.1624        | 3.0   | 507  | 0.5296          |
-| 0.1515        | 4.0   | 676  | 0.4989          |
-| 0.1434        | 5.0   | 845  | 0.4825          |
-| 0.1366        | 6.0   | 1014 | 0.4712          |
-| 0.1328        | 7.0   | 1183 | 0.4663          |
-| 0.1294        | 8.0   | 1352 | 0.4613          |
-| 0.1258        | 9.0   | 1521 | 0.4544          |
-| 0.1259        | 10.0  | 1690 | 0.4544          |
-| 0.1238        | 11.0  | 1859 | 0.4471          |
-| 0.1216        | 12.0  | 2028 | 0.4424          |
-| 0.1214        | 13.0  | 2197 | 0.4420          |
-| 0.1194        | 14.0  | 2366 | 0.4382          |
-| 0.1191        | 15.0  | 2535 | 0.4363          |
-| 0.1183        | 16.0  | 2704 | 0.4350          |
-| 0.1181        | 17.0  | 2873 | 0.4336          |
-| 0.1168        | 18.0  | 3042 | 0.4298          |
-| 0.1165        | 19.0  | 3211 | 0.4292          |
-| 0.1158        | 20.0  | 3380 | 0.4277          |
-| 0.1157        | 21.0  | 3549 | 0.4296          |
-| 0.1144        | 22.0  | 3718 | 0.4256          |
-| 0.113         | 23.0  | 3887 | 0.4253          |
-| 0.1136        | 24.0  | 4056 | 0.4239          |
-| 0.1143        | 25.0  | 4225 | 0.4227          |
-| 0.1143        | 26.0  | 4394 | 0.4235          |
-| 0.1126        | 27.0  | 4563 | 0.4232          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M10KInfPrompt_endtoken_ls
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M10KInfPrompt_endtoken_ls
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.0494
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 4
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 200
 - num_epochs: 50
 - mixed_precision_training: Native AMP
+- label_smoothing_factor: 0.1
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 19.0863       | 1.0   | 267   | 2.1942          |
+| 17.6413       | 2.0   | 534   | 2.1318          |
+| 17.3454       | 3.0   | 801   | 2.1143          |
+| 17.2455       | 4.0   | 1068  | 2.0979          |
+| 17.112        | 5.0   | 1335  | 2.0918          |
+| 17.0311       | 6.0   | 1602  | 2.0852          |
+| 16.9714       | 7.0   | 1869  | 2.0805          |
+| 16.8883       | 8.0   | 2136  | 2.0760          |
+| 16.8675       | 9.0   | 2403  | 2.0727          |
+| 16.8491       | 10.0  | 2670  | 2.0699          |
+| 16.8653       | 11.0  | 2937  | 2.0698          |
+| 16.7795       | 12.0  | 3204  | 2.0718          |
+| 16.8033       | 13.0  | 3471  | 2.0635          |
+| 16.7715       | 14.0  | 3738  | 2.0644          |
+| 16.7677       | 15.0  | 4005  | 2.0632          |
+| 16.7682       | 16.0  | 4272  | 2.0615          |
+| 16.7473       | 17.0  | 4539  | 2.0598          |
+| 16.7306       | 18.0  | 4806  | 2.0615          |
+| 16.6896       | 19.0  | 5073  | 2.0586          |
+| 16.7027       | 20.0  | 5340  | 2.0589          |
+| 16.6991       | 21.0  | 5607  | 2.0581          |
+| 16.6864       | 22.0  | 5874  | 2.0573          |
+| 16.6749       | 23.0  | 6141  | 2.0562          |
+| 16.6714       | 24.0  | 6408  | 2.0551          |
+| 16.6603       | 25.0  | 6675  | 2.0546          |
+| 16.6801       | 26.0  | 6942  | 2.0542          |
+| 16.6263       | 27.0  | 7209  | 2.0541          |
+| 16.6436       | 28.0  | 7476  | 2.0531          |
+| 16.6471       | 29.0  | 7743  | 2.0523          |
+| 16.6412       | 30.0  | 8010  | 2.0549          |
+| 16.6017       | 31.0  | 8277  | 2.0529          |
+| 16.6352       | 32.0  | 8544  | 2.0510          |
+| 16.5937       | 33.0  | 8811  | 2.0522          |
+| 16.6165       | 34.0  | 9078  | 2.0511          |
+| 16.5961       | 35.0  | 9345  | 2.0518          |
+| 16.5675       | 36.0  | 9612  | 2.0514          |
+| 16.5565       | 37.0  | 9879  | 2.0499          |
+| 16.6215       | 38.0  | 10146 | 2.0504          |
+| 16.6133       | 39.0  | 10413 | 2.0505          |
+| 16.5901       | 40.0  | 10680 | 2.0492          |
+| 16.5841       | 41.0  | 10947 | 2.0500          |
+| 16.5856       | 42.0  | 11214 | 2.0493          |
+| 16.5775       | 43.0  | 11481 | 2.0494          |
+| 16.5873       | 44.0  | 11748 | 2.0497          |
+| 16.5285       | 45.0  | 12015 | 2.0494          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fe15433e25d5d1a4330d286e7867a8fd46a167aab6e8090f78279ae921a633e0
-size 309974336

 version https://git-lfs.github.com/spec/v1
+oid sha256:98d3be695217498ed522e545a6d2957d39588226441e55cf9056262af2b903e6
+size 309980480

added_tokens.json CHANGED Viewed

@@ -1,4 +1,5 @@
 {
-  "<|endofex|>": 50258,
-  "<|startofex|>": 50257
 }

 {
+  "<endofex>": 50259,
+  "<pad>": 50257,
+  "<startofex>": 50258
 }

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 27.0,
-    "eval_loss": 0.4226756989955902,
-    "eval_runtime": 11.3061,
-    "eval_samples_per_second": 103.484,
-    "eval_steps_per_second": 3.273,
-    "perplexity": 1.5260393196260091,
-    "total_flos": 9564196506697728.0,
-    "train_loss": 0.13943315686140828,
-    "train_runtime": 711.1376,
-    "train_samples_per_second": 379.955,
-    "train_steps_per_second": 11.882
 }

 {
+    "epoch": 45.0,
+    "eval_loss": 2.0493669509887695,
+    "eval_runtime": 3.2905,
+    "eval_samples_per_second": 555.532,
+    "eval_steps_per_second": 69.594,
+    "perplexity": 7.7629852003609585,
+    "total_flos": 1.975920863064883e+16,
+    "train_loss": 16.87015275197182,
+    "train_runtime": 3210.0048,
+    "train_samples_per_second": 132.835,
+    "train_steps_per_second": 4.143
 }

eval_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 27.0,
-    "eval_loss": 0.4226756989955902,
-    "eval_runtime": 11.3061,
-    "eval_samples_per_second": 103.484,
-    "eval_steps_per_second": 3.273,
-    "perplexity": 1.5260393196260091
 }

 {
+    "epoch": 45.0,
+    "eval_loss": 2.0493669509887695,
+    "eval_runtime": 3.2905,
+    "eval_samples_per_second": 555.532,
+    "eval_steps_per_second": 69.594,
+    "perplexity": 7.7629852003609585
 }

special_tokens_map.json CHANGED Viewed

@@ -1,14 +1,14 @@
 {
   "additional_special_tokens": [
     {
-      "content": "<|startofex|>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false
     },
     {
-      "content": "<|endofex|>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
@@ -17,12 +17,6 @@
   ],
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
   "unk_token": "<|endoftext|>"
 }

 {
   "additional_special_tokens": [
     {
+      "content": "<startofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false
     },
     {
+      "content": "<endofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
   ],
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
+  "pad_token": "<pad>",
   "unk_token": "<|endoftext|>"
 }

tokenizer.json CHANGED Viewed

@@ -14,12 +14,12 @@
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
-      "normalized": false,
       "special": true
     },
     {
       "id": 50257,
-      "content": "<|startofex|>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
@@ -28,7 +28,16 @@
     },
     {
       "id": 50258,
-      "content": "<|endofex|>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,

       "single_word": false,
       "lstrip": false,
       "rstrip": false,
+      "normalized": true,
       "special": true
     },
     {
       "id": 50257,
+      "content": "<pad>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,
     },
     {
       "id": 50258,
+      "content": "<startofex>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 50259,
+      "content": "<endofex>",
       "single_word": false,
       "lstrip": false,
       "rstrip": false,

tokenizer_config.json CHANGED Viewed

@@ -4,13 +4,13 @@
     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
-      "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
     "50257": {
-      "content": "<|startofex|>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
@@ -18,7 +18,15 @@
       "special": true
     },
     "50258": {
-      "content": "<|endofex|>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
@@ -27,15 +35,15 @@
     }
   },
   "additional_special_tokens": [
-    "<|startofex|>",
-    "<|endofex|>"
   ],
   "bos_token": "<|endoftext|>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
-  "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

     "50256": {
       "content": "<|endoftext|>",
       "lstrip": false,
+      "normalized": true,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
     "50257": {
+      "content": "<pad>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "special": true
     },
     "50258": {
+      "content": "<startofex>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50259": {
+      "content": "<endofex>",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
     }
   },
   "additional_special_tokens": [
+    "<startofex>",
+    "<endofex>"
   ],
   "bos_token": "<|endoftext|>",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
   "model_max_length": 1024,
+  "pad_token": "<pad>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 27.0,
-    "total_flos": 9564196506697728.0,
-    "train_loss": 0.13943315686140828,
-    "train_runtime": 711.1376,
-    "train_samples_per_second": 379.955,
-    "train_steps_per_second": 11.882
 }

 {
+    "epoch": 45.0,
+    "total_flos": 1.975920863064883e+16,
+    "train_loss": 16.87015275197182,
+    "train_runtime": 3210.0048,
+    "train_samples_per_second": 132.835,
+    "train_steps_per_second": 4.143
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f3f2d25c850399c646c236c39f3ec487e3c380f1174232e2fcbece86f800d0f6
 size 5432

 version https://git-lfs.github.com/spec/v1
+oid sha256:41324b928c15047f3014d419675156ca38c76cb52eec239a84bc17d7df167284
 size 5432