Model save

Browse files

Files changed (3) hide show

README.md +55 -50
all_results.json +6 -6
train_results.json +6 -6

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M10KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M10KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7552
 ## Model description
@@ -46,53 +46,58 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 0.6082        | 1.0   | 153  | 1.4745          |
-| 0.3175        | 2.0   | 306  | 1.0258          |
-| 0.2798        | 3.0   | 459  | 0.9099          |
-| 0.244         | 4.0   | 612  | 0.8688          |
-| 0.2373        | 5.0   | 765  | 0.8479          |
-| 0.2252        | 6.0   | 918  | 0.8343          |
-| 0.2289        | 7.0   | 1071 | 0.8216          |
-| 0.2209        | 8.0   | 1224 | 0.8143          |
-| 0.2211        | 9.0   | 1377 | 0.8082          |
-| 0.2176        | 10.0  | 1530 | 0.8029          |
-| 0.2157        | 11.0  | 1683 | 0.7990          |
-| 0.2097        | 12.0  | 1836 | 0.7945          |
-| 0.2113        | 13.0  | 1989 | 0.7921          |
-| 0.2099        | 14.0  | 2142 | 0.7891          |
-| 0.2073        | 15.0  | 2295 | 0.7863          |
-| 0.2055        | 16.0  | 2448 | 0.7805          |
-| 0.2051        | 17.0  | 2601 | 0.7806          |
-| 0.2031        | 18.0  | 2754 | 0.7776          |
-| 0.2046        | 19.0  | 2907 | 0.7760          |
-| 0.206         | 20.0  | 3060 | 0.7720          |
-| 0.2043        | 21.0  | 3213 | 0.7725          |
-| 0.204         | 22.0  | 3366 | 0.7707          |
-| 0.2032        | 23.0  | 3519 | 0.7681          |
-| 0.2026        | 24.0  | 3672 | 0.7678          |
-| 0.1991        | 25.0  | 3825 | 0.7665          |
-| 0.2037        | 26.0  | 3978 | 0.7660          |
-| 0.2011        | 27.0  | 4131 | 0.7634          |
-| 0.2015        | 28.0  | 4284 | 0.7635          |
-| 0.2006        | 29.0  | 4437 | 0.7620          |
-| 0.2014        | 30.0  | 4590 | 0.7640          |
-| 0.2           | 31.0  | 4743 | 0.7609          |
-| 0.202         | 32.0  | 4896 | 0.7606          |
-| 0.1989        | 33.0  | 5049 | 0.7599          |
-| 0.1983        | 34.0  | 5202 | 0.7594          |
-| 0.2           | 35.0  | 5355 | 0.7596          |
-| 0.1991        | 36.0  | 5508 | 0.7588          |
-| 0.1978        | 37.0  | 5661 | 0.7576          |
-| 0.1975        | 38.0  | 5814 | 0.7572          |
-| 0.2007        | 39.0  | 5967 | 0.7569          |
-| 0.1987        | 40.0  | 6120 | 0.7563          |
-| 0.2002        | 41.0  | 6273 | 0.7561          |
-| 0.1961        | 42.0  | 6426 | 0.7563          |
-| 0.201         | 43.0  | 6579 | 0.7552          |
-| 0.1993        | 44.0  | 6732 | 0.7553          |
-| 0.1969        | 45.0  | 6885 | 0.7553          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M100KInfPrompt_endtoken
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M100KInfPrompt_endtoken
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6957
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.2129        | 1.0   | 1430  | 0.8043          |
+| 0.2018        | 2.0   | 2860  | 0.7705          |
+| 0.1949        | 3.0   | 4290  | 0.7588          |
+| 0.1913        | 4.0   | 5720  | 0.7498          |
+| 0.1921        | 5.0   | 7150  | 0.7435          |
+| 0.1903        | 6.0   | 8580  | 0.7371          |
+| 0.1888        | 7.0   | 10010 | 0.7339          |
+| 0.1881        | 8.0   | 11440 | 0.7299          |
+| 0.1872        | 9.0   | 12870 | 0.7267          |
+| 0.187         | 10.0  | 14300 | 0.7251          |
+| 0.184         | 11.0  | 15730 | 0.7229          |
+| 0.1846        | 12.0  | 17160 | 0.7212          |
+| 0.1851        | 13.0  | 18590 | 0.7182          |
+| 0.1804        | 14.0  | 20020 | 0.7153          |
+| 0.1848        | 15.0  | 21450 | 0.7141          |
+| 0.1824        | 16.0  | 22880 | 0.7144          |
+| 0.1796        | 17.0  | 24310 | 0.7116          |
+| 0.18          | 18.0  | 25740 | 0.7108          |
+| 0.1825        | 19.0  | 27170 | 0.7082          |
+| 0.1852        | 20.0  | 28600 | 0.7082          |
+| 0.1785        | 21.0  | 30030 | 0.7072          |
+| 0.1811        | 22.0  | 31460 | 0.7057          |
+| 0.178         | 23.0  | 32890 | 0.7059          |
+| 0.1827        | 24.0  | 34320 | 0.7046          |
+| 0.1813        | 25.0  | 35750 | 0.7033          |
+| 0.1825        | 26.0  | 37180 | 0.7039          |
+| 0.1795        | 27.0  | 38610 | 0.7032          |
+| 0.1801        | 28.0  | 40040 | 0.7017          |
+| 0.1781        | 29.0  | 41470 | 0.7013          |
+| 0.1823        | 30.0  | 42900 | 0.7010          |
+| 0.1781        | 31.0  | 44330 | 0.7012          |
+| 0.1809        | 32.0  | 45760 | 0.6999          |
+| 0.1764        | 33.0  | 47190 | 0.6996          |
+| 0.1791        | 34.0  | 48620 | 0.6983          |
+| 0.1793        | 35.0  | 50050 | 0.6988          |
+| 0.1785        | 36.0  | 51480 | 0.6980          |
+| 0.1777        | 37.0  | 52910 | 0.6980          |
+| 0.1774        | 38.0  | 54340 | 0.6980          |
+| 0.1795        | 39.0  | 55770 | 0.6976          |
+| 0.1772        | 40.0  | 57200 | 0.6974          |
+| 0.1793        | 41.0  | 58630 | 0.6974          |
+| 0.1777        | 42.0  | 60060 | 0.6968          |
+| 0.1777        | 43.0  | 61490 | 0.6965          |
+| 0.1779        | 44.0  | 62920 | 0.6965          |
+| 0.1782        | 45.0  | 64350 | 0.6964          |
+| 0.1765        | 46.0  | 65780 | 0.6961          |
+| 0.1758        | 47.0  | 67210 | 0.6962          |
+| 0.1763        | 48.0  | 68640 | 0.6960          |
+| 0.1788        | 49.0  | 70070 | 0.6958          |
+| 0.1776        | 50.0  | 71500 | 0.6957          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 45.0,
     "eval_loss": 0.7551774382591248,
     "eval_runtime": 6.5379,
     "eval_samples_per_second": 161.672,
     "eval_steps_per_second": 5.2,
     "perplexity": 2.127989076535295,
-    "total_flos": 1.438582110879744e+16,
-    "train_loss": 0.2186263353822884,
-    "train_runtime": 769.5598,
-    "train_samples_per_second": 316.869,
-    "train_steps_per_second": 9.941
 }

 {
+    "epoch": 50.0,
     "eval_loss": 0.7551774382591248,
     "eval_runtime": 6.5379,
     "eval_samples_per_second": 161.672,
     "eval_steps_per_second": 5.2,
     "perplexity": 2.127989076535295,
+    "total_flos": 1.49878932701184e+17,
+    "train_loss": 0.18437957987751993,
+    "train_runtime": 7319.2549,
+    "train_samples_per_second": 312.395,
+    "train_steps_per_second": 9.769
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 45.0,
-    "total_flos": 1.438582110879744e+16,
-    "train_loss": 0.2186263353822884,
-    "train_runtime": 769.5598,
-    "train_samples_per_second": 316.869,
-    "train_steps_per_second": 9.941
 }

 {
+    "epoch": 50.0,
+    "total_flos": 1.49878932701184e+17,
+    "train_loss": 0.18437957987751993,
+    "train_runtime": 7319.2549,
+    "train_samples_per_second": 312.395,
+    "train_steps_per_second": 9.769
 }