augustocsc
/

Se124M100KInfMinimalist

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions

xet

Community

augustocsc commited on May 6, 2025

Commit

1cd801f

verified ·

1 Parent(s): cb0ab0e

Model save

Browse files

Files changed (3) hide show

README.md +55 -55
all_results.json +5 -5
train_results.json +5 -5

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M100KInfDelimiter
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M100KInfDelimiter
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4823
 ## Model description
@@ -46,58 +46,58 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step   | Validation Loss |
-|:-------------:|:-----:|:------:|:---------------:|
-| 0.1538        | 1.0   | 2090   | 0.5678          |
-| 0.1412        | 2.0   | 4180   | 0.5413          |
-| 0.1405        | 3.0   | 6270   | 0.5325          |
-| 0.1355        | 4.0   | 8360   | 0.5241          |
-| 0.1364        | 5.0   | 10450  | 0.5211          |
-| 0.1341        | 6.0   | 12540  | 0.5170          |
-| 0.1312        | 7.0   | 14630  | 0.5123          |
-| 0.1304        | 8.0   | 16720  | 0.5078          |
-| 0.1301        | 9.0   | 18810  | 0.5064          |
-| 0.1286        | 10.0  | 20900  | 0.5058          |
-| 0.1308        | 11.0  | 22990  | 0.5022          |
-| 0.1292        | 12.0  | 25080  | 0.5007          |
-| 0.1287        | 13.0  | 27170  | 0.5005          |
-| 0.1306        | 14.0  | 29260  | 0.4976          |
-| 0.1312        | 15.0  | 31350  | 0.4975          |
-| 0.1268        | 16.0  | 33440  | 0.4963          |
-| 0.1267        | 17.0  | 35530  | 0.4944          |
-| 0.1273        | 18.0  | 37620  | 0.4932          |
-| 0.1243        | 19.0  | 39710  | 0.4925          |
-| 0.1266        | 20.0  | 41800  | 0.4912          |
-| 0.127         | 21.0  | 43890  | 0.4914          |
-| 0.1278        | 22.0  | 45980  | 0.4905          |
-| 0.1276        | 23.0  | 48070  | 0.4899          |
-| 0.1285        | 24.0  | 50160  | 0.4888          |
-| 0.1264        | 25.0  | 52250  | 0.4889          |
-| 0.1256        | 26.0  | 54340  | 0.4881          |
-| 0.1251        | 27.0  | 56430  | 0.4876          |
-| 0.1291        | 28.0  | 58520  | 0.4869          |
-| 0.1254        | 29.0  | 60610  | 0.4867          |
-| 0.1268        | 30.0  | 62700  | 0.4863          |
-| 0.1247        | 31.0  | 64790  | 0.4857          |
-| 0.126         | 32.0  | 66880  | 0.4855          |
-| 0.1262        | 33.0  | 68970  | 0.4852          |
-| 0.1257        | 34.0  | 71060  | 0.4848          |
-| 0.1246        | 35.0  | 73150  | 0.4846          |
-| 0.1261        | 36.0  | 75240  | 0.4839          |
-| 0.1269        | 37.0  | 77330  | 0.4839          |
-| 0.1244        | 38.0  | 79420  | 0.4836          |
-| 0.1243        | 39.0  | 81510  | 0.4836          |
-| 0.1256        | 40.0  | 83600  | 0.4834          |
-| 0.1237        | 41.0  | 85690  | 0.4827          |
-| 0.1244        | 42.0  | 87780  | 0.4833          |
-| 0.1234        | 43.0  | 89870  | 0.4828          |
-| 0.1255        | 44.0  | 91960  | 0.4824          |
-| 0.1272        | 45.0  | 94050  | 0.4826          |
-| 0.1258        | 46.0  | 96140  | 0.4824          |
-| 0.1264        | 47.0  | 98230  | 0.4825          |
-| 0.1236        | 48.0  | 100320 | 0.4824          |
-| 0.1254        | 49.0  | 102410 | 0.4825          |
-| 0.1242        | 50.0  | 104500 | 0.4823          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M100KInfMinimalist
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M100KInfMinimalist
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5392
 ## Model description
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.1691        | 1.0   | 1860  | 0.6314          |
+| 0.1598        | 2.0   | 3720  | 0.6036          |
+| 0.1539        | 3.0   | 5580  | 0.5906          |
+| 0.153         | 4.0   | 7440  | 0.5836          |
+| 0.1507        | 5.0   | 9300  | 0.5790          |
+| 0.1483        | 6.0   | 11160 | 0.5746          |
+| 0.149         | 7.0   | 13020 | 0.5703          |
+| 0.1485        | 8.0   | 14880 | 0.5684          |
+| 0.1462        | 9.0   | 16740 | 0.5656          |
+| 0.1469        | 10.0  | 18600 | 0.5630          |
+| 0.1449        | 11.0  | 20460 | 0.5617          |
+| 0.1469        | 12.0  | 22320 | 0.5581          |
+| 0.1456        | 13.0  | 24180 | 0.5575          |
+| 0.1459        | 14.0  | 26040 | 0.5547          |
+| 0.1432        | 15.0  | 27900 | 0.5544          |
+| 0.1429        | 16.0  | 29760 | 0.5540          |
+| 0.1431        | 17.0  | 31620 | 0.5523          |
+| 0.1432        | 18.0  | 33480 | 0.5512          |
+| 0.1423        | 19.0  | 35340 | 0.5519          |
+| 0.1429        | 20.0  | 37200 | 0.5506          |
+| 0.1429        | 21.0  | 39060 | 0.5490          |
+| 0.1441        | 22.0  | 40920 | 0.5477          |
+| 0.1426        | 23.0  | 42780 | 0.5476          |
+| 0.1436        | 24.0  | 44640 | 0.5463          |
+| 0.1419        | 25.0  | 46500 | 0.5462          |
+| 0.1399        | 26.0  | 48360 | 0.5449          |
+| 0.1412        | 27.0  | 50220 | 0.5452          |
+| 0.14          | 28.0  | 52080 | 0.5440          |
+| 0.1396        | 29.0  | 53940 | 0.5440          |
+| 0.1402        | 30.0  | 55800 | 0.5440          |
+| 0.1404        | 31.0  | 57660 | 0.5437          |
+| 0.1415        | 32.0  | 59520 | 0.5427          |
+| 0.1406        | 33.0  | 61380 | 0.5420          |
+| 0.1387        | 34.0  | 63240 | 0.5422          |
+| 0.1392        | 35.0  | 65100 | 0.5420          |
+| 0.1404        | 36.0  | 66960 | 0.5420          |
+| 0.1436        | 37.0  | 68820 | 0.5411          |
+| 0.1424        | 38.0  | 70680 | 0.5415          |
+| 0.141         | 39.0  | 72540 | 0.5407          |
+| 0.1402        | 40.0  | 74400 | 0.5403          |
+| 0.1412        | 41.0  | 76260 | 0.5407          |
+| 0.139         | 42.0  | 78120 | 0.5403          |
+| 0.1357        | 43.0  | 79980 | 0.5401          |
+| 0.1396        | 44.0  | 81840 | 0.5397          |
+| 0.1398        | 45.0  | 83700 | 0.5394          |
+| 0.1385        | 46.0  | 85560 | 0.5395          |
+| 0.1408        | 47.0  | 87420 | 0.5396          |
+| 0.1371        | 48.0  | 89280 | 0.5392          |
+| 0.1418        | 49.0  | 91140 | 0.5393          |
+| 0.1382        | 50.0  | 93000 | 0.5392          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 160.588,
     "eval_steps_per_second": 5.02,
     "perplexity": 1.619794707189944,
-    "total_flos": 2.191516447408128e+17,
-    "train_loss": 0.12940116086530914,
-    "train_runtime": 10778.4774,
-    "train_samples_per_second": 310.183,
-    "train_steps_per_second": 9.695
 }

     "eval_samples_per_second": 160.588,
     "eval_steps_per_second": 5.02,
     "perplexity": 1.619794707189944,
+    "total_flos": 1.950556483878912e+17,
+    "train_loss": 0.14462488290315034,
+    "train_runtime": 9614.4477,
+    "train_samples_per_second": 309.503,
+    "train_steps_per_second": 9.673
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 50.0,
-    "total_flos": 2.191516447408128e+17,
-    "train_loss": 0.12940116086530914,
-    "train_runtime": 10778.4774,
-    "train_samples_per_second": 310.183,
-    "train_steps_per_second": 9.695
 }

 {
     "epoch": 50.0,
+    "total_flos": 1.950556483878912e+17,
+    "train_loss": 0.14462488290315034,
+    "train_runtime": 9614.4477,
+    "train_samples_per_second": 309.503,
+    "train_steps_per_second": 9.673
 }