augustocsc
/

Se124M100KInfSimple

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions

xet

Community

augustocsc commited on May 6, 2025

Commit

f86293d

verified ·

1 Parent(s): c3ed615

Model save

Browse files

Files changed (3) hide show

README.md +55 -55
all_results.json +5 -5
train_results.json +5 -5

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M10KInfMinimalist
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M10KInfMinimalist
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6063
 ## Model description
@@ -46,58 +46,58 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 0.3678        | 1.0   | 200   | 1.1110          |
-| 0.2364        | 2.0   | 400   | 0.7918          |
-| 0.2048        | 3.0   | 600   | 0.7298          |
-| 0.1916        | 4.0   | 800   | 0.7019          |
-| 0.1861        | 5.0   | 1000  | 0.6841          |
-| 0.179         | 6.0   | 1200  | 0.6719          |
-| 0.1767        | 7.0   | 1400  | 0.6665          |
-| 0.1722        | 8.0   | 1600  | 0.6568          |
-| 0.1714        | 9.0   | 1800  | 0.6538          |
-| 0.1651        | 10.0  | 2000  | 0.6493          |
-| 0.1674        | 11.0  | 2200  | 0.6446          |
-| 0.1655        | 12.0  | 2400  | 0.6407          |
-| 0.1643        | 13.0  | 2600  | 0.6384          |
-| 0.1638        | 14.0  | 2800  | 0.6346          |
-| 0.1619        | 15.0  | 3000  | 0.6347          |
-| 0.1618        | 16.0  | 3200  | 0.6299          |
-| 0.1625        | 17.0  | 3400  | 0.6292          |
-| 0.1586        | 18.0  | 3600  | 0.6262          |
-| 0.1589        | 19.0  | 3800  | 0.6235          |
-| 0.1616        | 20.0  | 4000  | 0.6229          |
-| 0.1609        | 21.0  | 4200  | 0.6218          |
-| 0.1575        | 22.0  | 4400  | 0.6195          |
-| 0.1601        | 23.0  | 4600  | 0.6200          |
-| 0.1577        | 24.0  | 4800  | 0.6159          |
-| 0.1593        | 25.0  | 5000  | 0.6171          |
-| 0.1574        | 26.0  | 5200  | 0.6185          |
-| 0.1582        | 27.0  | 5400  | 0.6139          |
-| 0.1563        | 28.0  | 5600  | 0.6141          |
-| 0.1563        | 29.0  | 5800  | 0.6146          |
-| 0.1595        | 30.0  | 6000  | 0.6124          |
-| 0.1575        | 31.0  | 6200  | 0.6126          |
-| 0.1537        | 32.0  | 6400  | 0.6121          |
-| 0.1559        | 33.0  | 6600  | 0.6104          |
-| 0.1543        | 34.0  | 6800  | 0.6116          |
-| 0.1562        | 35.0  | 7000  | 0.6098          |
-| 0.1558        | 36.0  | 7200  | 0.6089          |
-| 0.1551        | 37.0  | 7400  | 0.6089          |
-| 0.1537        | 38.0  | 7600  | 0.6085          |
-| 0.1526        | 39.0  | 7800  | 0.6084          |
-| 0.1556        | 40.0  | 8000  | 0.6085          |
-| 0.1548        | 41.0  | 8200  | 0.6080          |
-| 0.1542        | 42.0  | 8400  | 0.6078          |
-| 0.1581        | 43.0  | 8600  | 0.6071          |
-| 0.1555        | 44.0  | 8800  | 0.6066          |
-| 0.1547        | 45.0  | 9000  | 0.6064          |
-| 0.1569        | 46.0  | 9200  | 0.6067          |
-| 0.1524        | 47.0  | 9400  | 0.6063          |
-| 0.1555        | 48.0  | 9600  | 0.6065          |
-| 0.1543        | 49.0  | 9800  | 0.6065          |
-| 0.1559        | 50.0  | 10000 | 0.6063          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M100KInfSimple
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M100KInfSimple
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4582
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step   | Validation Loss |
+|:-------------:|:-----:|:------:|:---------------:|
+| 0.1445        | 1.0   | 2205   | 0.5442          |
+| 0.1358        | 2.0   | 4410   | 0.5179          |
+| 0.1317        | 3.0   | 6615   | 0.5090          |
+| 0.1297        | 4.0   | 8820   | 0.5015          |
+| 0.1299        | 5.0   | 11025  | 0.4954          |
+| 0.1301        | 6.0   | 13230  | 0.4917          |
+| 0.1258        | 7.0   | 15435  | 0.4875          |
+| 0.1254        | 8.0   | 17640  | 0.4834          |
+| 0.1231        | 9.0   | 19845  | 0.4816          |
+| 0.1254        | 10.0  | 22050  | 0.4798          |
+| 0.125         | 11.0  | 24255  | 0.4778          |
+| 0.1225        | 12.0  | 26460  | 0.4775          |
+| 0.1233        | 13.0  | 28665  | 0.4753          |
+| 0.1213        | 14.0  | 30870  | 0.4737          |
+| 0.1231        | 15.0  | 33075  | 0.4719          |
+| 0.1233        | 16.0  | 35280  | 0.4716          |
+| 0.1225        | 17.0  | 37485  | 0.4702          |
+| 0.1218        | 18.0  | 39690  | 0.4696          |
+| 0.1213        | 19.0  | 41895  | 0.4678          |
+| 0.1213        | 20.0  | 44100  | 0.4673          |
+| 0.121         | 21.0  | 46305  | 0.4675          |
+| 0.122         | 22.0  | 48510  | 0.4663          |
+| 0.1195        | 23.0  | 50715  | 0.4657          |
+| 0.1221        | 24.0  | 52920  | 0.4647          |
+| 0.1212        | 25.0  | 55125  | 0.4647          |
+| 0.121         | 26.0  | 57330  | 0.4640          |
+| 0.1213        | 27.0  | 59535  | 0.4637          |
+| 0.1184        | 28.0  | 61740  | 0.4629          |
+| 0.12          | 29.0  | 63945  | 0.4627          |
+| 0.1191        | 30.0  | 66150  | 0.4622          |
+| 0.1195        | 31.0  | 68355  | 0.4624          |
+| 0.1188        | 32.0  | 70560  | 0.4619          |
+| 0.1202        | 33.0  | 72765  | 0.4620          |
+| 0.119         | 34.0  | 74970  | 0.4605          |
+| 0.1206        | 35.0  | 77175  | 0.4608          |
+| 0.1197        | 36.0  | 79380  | 0.4601          |
+| 0.1199        | 37.0  | 81585  | 0.4597          |
+| 0.1204        | 38.0  | 83790  | 0.4601          |
+| 0.1185        | 39.0  | 85995  | 0.4596          |
+| 0.1184        | 40.0  | 88200  | 0.4591          |
+| 0.119         | 41.0  | 90405  | 0.4594          |
+| 0.1181        | 42.0  | 92610  | 0.4591          |
+| 0.1178        | 43.0  | 94815  | 0.4588          |
+| 0.1188        | 44.0  | 97020  | 0.4586          |
+| 0.1189        | 45.0  | 99225  | 0.4584          |
+| 0.1183        | 46.0  | 101430 | 0.4583          |
+| 0.1184        | 47.0  | 103635 | 0.4582          |
+| 0.1185        | 48.0  | 105840 | 0.4581          |
+| 0.1198        | 49.0  | 108045 | 0.4582          |
+| 0.1207        | 50.0  | 110250 | 0.4582          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 154.382,
     "eval_steps_per_second": 4.872,
     "perplexity": 1.8336079294454999,
-    "total_flos": 2.08971807326208e+16,
-    "train_loss": 0.16982314805984497,
-    "train_runtime": 1086.9473,
-    "train_samples_per_second": 293.298,
-    "train_steps_per_second": 9.2
 }

     "eval_samples_per_second": 154.382,
     "eval_steps_per_second": 4.872,
     "perplexity": 1.8336079294454999,
+    "total_flos": 2.31193087967232e+17,
+    "train_loss": 0.1232609451103643,
+    "train_runtime": 11754.4686,
+    "train_samples_per_second": 300.056,
+    "train_steps_per_second": 9.379
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 50.0,
-    "total_flos": 2.08971807326208e+16,
-    "train_loss": 0.16982314805984497,
-    "train_runtime": 1086.9473,
-    "train_samples_per_second": 293.298,
-    "train_steps_per_second": 9.2
 }

 {
     "epoch": 50.0,
+    "total_flos": 2.31193087967232e+17,
+    "train_loss": 0.1232609451103643,
+    "train_runtime": 11754.4686,
+    "train_samples_per_second": 300.056,
+    "train_steps_per_second": 9.379
 }