Model save

Browse files

Files changed (4) hide show

README.md +29 -38
adapter_model.safetensors +1 -1
all_results.json +6 -6
train_results.json +6 -6

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M500KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M500KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6716
 ## Model description
@@ -36,8 +36,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
@@ -46,39 +46,30 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step   | Validation Loss |
-|:-------------:|:-----:|:------:|:---------------:|
-| 0.1898        | 1.0   | 5427   | 0.7433          |
-| 0.1857        | 2.0   | 10854  | 0.7238          |
-| 0.1843        | 3.0   | 16281  | 0.7118          |
-| 0.1813        | 4.0   | 21708  | 0.7045          |
-| 0.1802        | 5.0   | 27135  | 0.6990          |
-| 0.1785        | 6.0   | 32562  | 0.6944          |
-| 0.1769        | 7.0   | 37989  | 0.6918          |
-| 0.1743        | 8.0   | 43416  | 0.6875          |
-| 0.1752        | 9.0   | 48843  | 0.6854          |
-| 0.1756        | 10.0  | 54270  | 0.6854          |
-| 0.1736        | 11.0  | 59697  | 0.6837          |
-| 0.1756        | 12.0  | 65124  | 0.6812          |
-| 0.173         | 13.0  | 70551  | 0.6798          |
-| 0.1737        | 14.0  | 75978  | 0.6791          |
-| 0.1741        | 15.0  | 81405  | 0.6783          |
-| 0.177         | 16.0  | 86832  | 0.6771          |
-| 0.1734        | 17.0  | 92259  | 0.6765          |
-| 0.1719        | 18.0  | 97686  | 0.6760          |
-| 0.1737        | 19.0  | 103113 | 0.6763          |
-| 0.1716        | 20.0  | 108540 | 0.6747          |
-| 0.1713        | 21.0  | 113967 | 0.6741          |
-| 0.1739        | 22.0  | 119394 | 0.6738          |
-| 0.1694        | 23.0  | 124821 | 0.6737          |
-| 0.1703        | 24.0  | 130248 | 0.6743          |
-| 0.1697        | 25.0  | 135675 | 0.6730          |
-| 0.172         | 26.0  | 141102 | 0.6731          |
-| 0.1711        | 27.0  | 146529 | 0.6720          |
-| 0.1726        | 28.0  | 151956 | 0.6720          |
-| 0.1703        | 29.0  | 157383 | 0.6716          |
-| 0.1732        | 30.0  | 162810 | 0.6716          |
-| 0.171         | 31.0  | 168237 | 0.6719          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M10KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M10KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.7290
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.9762        | 1.0   | 610   | 0.8681          |
+| 0.8832        | 2.0   | 1220  | 0.8117          |
+| 0.8432        | 3.0   | 1830  | 0.7922          |
+| 0.8183        | 4.0   | 2440  | 0.7809          |
+| 0.8131        | 5.0   | 3050  | 0.7687          |
+| 0.8004        | 6.0   | 3660  | 0.7640          |
+| 0.7972        | 7.0   | 4270  | 0.7584          |
+| 0.7941        | 8.0   | 4880  | 0.7528          |
+| 0.789         | 9.0   | 5490  | 0.7480          |
+| 0.7806        | 10.0  | 6100  | 0.7454          |
+| 0.7736        | 11.0  | 6710  | 0.7442          |
+| 0.769         | 12.0  | 7320  | 0.7444          |
+| 0.7734        | 13.0  | 7930  | 0.7402          |
+| 0.7671        | 14.0  | 8540  | 0.7385          |
+| 0.7605        | 15.0  | 9150  | 0.7365          |
+| 0.7651        | 16.0  | 9760  | 0.7357          |
+| 0.7657        | 17.0  | 10370 | 0.7340          |
+| 0.763         | 18.0  | 10980 | 0.7318          |
+| 0.7552        | 19.0  | 11590 | 0.7305          |
+| 0.7563        | 20.0  | 12200 | 0.7284          |
+| 0.7558        | 21.0  | 12810 | 0.7285          |
+| 0.7465        | 22.0  | 13420 | 0.7290          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c97efabc5452645c7121eaff9c657eb938d328b75e72e3460123eba5c9bc7b0a
 size 309980480

 version https://git-lfs.github.com/spec/v1
+oid sha256:b3fcfe7e23f9748484fb3905bab3f496acd2b192c2e093f2ef55297d09b18d3a
 size 309980480

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 31.0,
     "eval_loss": 0.6716023087501526,
     "eval_runtime": 232.3059,
     "eval_samples_per_second": 160.323,
     "eval_steps_per_second": 5.011,
     "perplexity": 1.9573711221830141,
-    "total_flos": 3.528546650263388e+17,
-    "train_loss": 0.17647521918996942,
-    "train_runtime": 17128.7716,
-    "train_samples_per_second": 506.884,
-    "train_steps_per_second": 15.842
 }

 {
+    "epoch": 22.0,
     "eval_loss": 0.6716023087501526,
     "eval_runtime": 232.3059,
     "eval_samples_per_second": 160.323,
     "eval_steps_per_second": 5.011,
     "perplexity": 1.9573711221830141,
+    "total_flos": 7033068097634304.0,
+    "train_loss": 0.8158138496865104,
+    "train_runtime": 912.3512,
+    "train_samples_per_second": 267.276,
+    "train_steps_per_second": 33.43
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 31.0,
-    "total_flos": 3.528546650263388e+17,
-    "train_loss": 0.17647521918996942,
-    "train_runtime": 17128.7716,
-    "train_samples_per_second": 506.884,
-    "train_steps_per_second": 15.842
 }

 {
+    "epoch": 22.0,
+    "total_flos": 7033068097634304.0,
+    "train_loss": 0.8158138496865104,
+    "train_runtime": 912.3512,
+    "train_samples_per_second": 267.276,
+    "train_steps_per_second": 33.43
 }