Model save

Browse files

Files changed (4) hide show

README.md +55 -55
adapter_model.safetensors +1 -1
all_results.json +5 -5
train_results.json +5 -5

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M100KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M100KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6695
 ## Model description
@@ -47,58 +47,58 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step   | Validation Loss |
-|:-------------:|:-----:|:------:|:---------------:|
-| 0.7209        | 1.0   | 5717   | 0.7060          |
-| 0.7027        | 2.0   | 11434  | 0.6916          |
-| 0.7005        | 3.0   | 17151  | 0.6865          |
-| 0.7009        | 4.0   | 22868  | 0.6858          |
-| 0.6933        | 5.0   | 28585  | 0.6854          |
-| 0.6922        | 6.0   | 34302  | 0.6825          |
-| 0.6859        | 7.0   | 40019  | 0.6810          |
-| 0.6923        | 8.0   | 45736  | 0.6812          |
-| 0.6919        | 9.0   | 51453  | 0.6809          |
-| 0.6871        | 10.0  | 57170  | 0.6795          |
-| 0.6844        | 11.0  | 62887  | 0.6776          |
-| 0.6923        | 12.0  | 68604  | 0.6780          |
-| 0.6878        | 13.0  | 74321  | 0.6785          |
-| 0.6765        | 14.0  | 80038  | 0.6775          |
-| 0.6864        | 15.0  | 85755  | 0.6769          |
-| 0.6776        | 16.0  | 91472  | 0.6761          |
-| 0.6823        | 17.0  | 97189  | 0.6768          |
-| 0.6743        | 18.0  | 102906 | 0.6751          |
-| 0.682         | 19.0  | 108623 | 0.6776          |
-| 0.6902        | 20.0  | 114340 | 0.6762          |
-| 0.6774        | 21.0  | 120057 | 0.6751          |
-| 0.6748        | 22.0  | 125774 | 0.6747          |
-| 0.6864        | 23.0  | 131491 | 0.6745          |
-| 0.6819        | 24.0  | 137208 | 0.6756          |
-| 0.6818        | 25.0  | 142925 | 0.6745          |
-| 0.6757        | 26.0  | 148642 | 0.6737          |
-| 0.6801        | 27.0  | 154359 | 0.6734          |
-| 0.6717        | 28.0  | 160076 | 0.6724          |
-| 0.6717        | 29.0  | 165793 | 0.6722          |
-| 0.6802        | 30.0  | 171510 | 0.6723          |
-| 0.677         | 31.0  | 177227 | 0.6725          |
-| 0.6764        | 32.0  | 182944 | 0.6712          |
-| 0.6767        | 33.0  | 188661 | 0.6712          |
-| 0.6758        | 34.0  | 194378 | 0.6716          |
-| 0.6772        | 35.0  | 200095 | 0.6715          |
-| 0.679         | 36.0  | 205812 | 0.6717          |
-| 0.6744        | 37.0  | 211529 | 0.6702          |
-| 0.6654        | 38.0  | 217246 | 0.6707          |
-| 0.6723        | 39.0  | 222963 | 0.6704          |
-| 0.6758        | 40.0  | 228680 | 0.6701          |
-| 0.6795        | 41.0  | 234397 | 0.6701          |
-| 0.6681        | 42.0  | 240114 | 0.6698          |
-| 0.6761        | 43.0  | 245831 | 0.6700          |
-| 0.673         | 44.0  | 251548 | 0.6697          |
-| 0.6736        | 45.0  | 257265 | 0.6698          |
-| 0.673         | 46.0  | 262982 | 0.6695          |
-| 0.6686        | 47.0  | 268699 | 0.6695          |
-| 0.666         | 48.0  | 274416 | 0.6696          |
-| 0.663         | 49.0  | 280133 | 0.6695          |
-| 0.6667        | 50.0  | 285850 | 0.6695          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M100KInfPrompt_endtoken2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M100KInfPrompt_endtoken2
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6704
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.7736        | 1.0   | 1066  | 0.7302          |
+| 0.7318        | 2.0   | 2132  | 0.7070          |
+| 0.7189        | 3.0   | 3198  | 0.7002          |
+| 0.7128        | 4.0   | 4264  | 0.6952          |
+| 0.7031        | 5.0   | 5330  | 0.6878          |
+| 0.6944        | 6.0   | 6396  | 0.6873          |
+| 0.6935        | 7.0   | 7462  | 0.6879          |
+| 0.6987        | 8.0   | 8528  | 0.6837          |
+| 0.7011        | 9.0   | 9594  | 0.6830          |
+| 0.6799        | 10.0  | 10660 | 0.6819          |
+| 0.6754        | 11.0  | 11726 | 0.6793          |
+| 0.6769        | 12.0  | 12792 | 0.6800          |
+| 0.6827        | 13.0  | 13858 | 0.6766          |
+| 0.6905        | 14.0  | 14924 | 0.6808          |
+| 0.6769        | 15.0  | 15990 | 0.6769          |
+| 0.6751        | 16.0  | 17056 | 0.6776          |
+| 0.688         | 17.0  | 18122 | 0.6739          |
+| 0.6922        | 18.0  | 19188 | 0.6772          |
+| 0.6804        | 19.0  | 20254 | 0.6743          |
+| 0.6718        | 20.0  | 21320 | 0.6738          |
+| 0.681         | 21.0  | 22386 | 0.6749          |
+| 0.6757        | 22.0  | 23452 | 0.6729          |
+| 0.6777        | 23.0  | 24518 | 0.6756          |
+| 0.6667        | 24.0  | 25584 | 0.6730          |
+| 0.6758        | 25.0  | 26650 | 0.6719          |
+| 0.6602        | 26.0  | 27716 | 0.6715          |
+| 0.6746        | 27.0  | 28782 | 0.6723          |
+| 0.6647        | 28.0  | 29848 | 0.6721          |
+| 0.6673        | 29.0  | 30914 | 0.6732          |
+| 0.6745        | 30.0  | 31980 | 0.6728          |
+| 0.6659        | 31.0  | 33046 | 0.6710          |
+| 0.6578        | 32.0  | 34112 | 0.6710          |
+| 0.6649        | 33.0  | 35178 | 0.6711          |
+| 0.6665        | 34.0  | 36244 | 0.6710          |
+| 0.6608        | 35.0  | 37310 | 0.6714          |
+| 0.6623        | 36.0  | 38376 | 0.6708          |
+| 0.6789        | 37.0  | 39442 | 0.6704          |
+| 0.6536        | 38.0  | 40508 | 0.6708          |
+| 0.6746        | 39.0  | 41574 | 0.6710          |
+| 0.6634        | 40.0  | 42640 | 0.6704          |
+| 0.65          | 41.0  | 43706 | 0.6710          |
+| 0.6638        | 42.0  | 44772 | 0.6702          |
+| 0.6586        | 43.0  | 45838 | 0.6705          |
+| 0.6546        | 44.0  | 46904 | 0.6706          |
+| 0.651         | 45.0  | 47970 | 0.6701          |
+| 0.6604        | 46.0  | 49036 | 0.6705          |
+| 0.6756        | 47.0  | 50102 | 0.6706          |
+| 0.6612        | 48.0  | 51168 | 0.6705          |
+| 0.6553        | 49.0  | 52234 | 0.6705          |
+| 0.6561        | 50.0  | 53300 | 0.6704          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:70fca0a71e66cf9a0d4139c4189e0eee25ba6249aa02c731201e0cc323595050
 size 309980480

 version https://git-lfs.github.com/spec/v1
+oid sha256:7dbedf96eccfbf562fc49ec87ec06f9f4334af5516644a67c35d1b3d6953a060
 size 309980480

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 577.411,
     "eval_steps_per_second": 72.176,
     "perplexity": 1.953220034675124,
-    "total_flos": 1.49878932701184e+17,
-    "train_loss": 0.6821009000064568,
-    "train_runtime": 10227.5107,
-    "train_samples_per_second": 223.564,
-    "train_steps_per_second": 27.949
 }

     "eval_samples_per_second": 577.411,
     "eval_steps_per_second": 72.176,
     "perplexity": 1.953220034675124,
+    "total_flos": 2.3883058800820224e+16,
+    "train_loss": 0.6825056470402187,
+    "train_runtime": 1762.4542,
+    "train_samples_per_second": 241.935,
+    "train_steps_per_second": 30.242
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 50.0,
-    "total_flos": 1.49878932701184e+17,
-    "train_loss": 0.6821009000064568,
-    "train_runtime": 10227.5107,
-    "train_samples_per_second": 223.564,
-    "train_steps_per_second": 27.949
 }

 {
     "epoch": 50.0,
+    "total_flos": 2.3883058800820224e+16,
+    "train_loss": 0.6825056470402187,
+    "train_runtime": 1762.4542,
+    "train_samples_per_second": 241.935,
+    "train_steps_per_second": 30.242
 }