augustocsc
/

Se124M10KInfSimple

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions

xet

Community

augustocsc commited on May 6, 2025

Commit

c3169d9

verified ·

1 Parent(s): a825929

Model save

Browse files

Files changed (3) hide show

README.md +9 -56
all_results.json +6 -6
train_results.json +6 -6

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M100KInfMinimalist
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M100KInfMinimalist
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5392
 ## Model description
@@ -41,63 +41,16 @@ The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
-- num_epochs: 50
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 0.1691        | 1.0   | 1860  | 0.6314          |
-| 0.1598        | 2.0   | 3720  | 0.6036          |
-| 0.1539        | 3.0   | 5580  | 0.5906          |
-| 0.153         | 4.0   | 7440  | 0.5836          |
-| 0.1507        | 5.0   | 9300  | 0.5790          |
-| 0.1483        | 6.0   | 11160 | 0.5746          |
-| 0.149         | 7.0   | 13020 | 0.5703          |
-| 0.1485        | 8.0   | 14880 | 0.5684          |
-| 0.1462        | 9.0   | 16740 | 0.5656          |
-| 0.1469        | 10.0  | 18600 | 0.5630          |
-| 0.1449        | 11.0  | 20460 | 0.5617          |
-| 0.1469        | 12.0  | 22320 | 0.5581          |
-| 0.1456        | 13.0  | 24180 | 0.5575          |
-| 0.1459        | 14.0  | 26040 | 0.5547          |
-| 0.1432        | 15.0  | 27900 | 0.5544          |
-| 0.1429        | 16.0  | 29760 | 0.5540          |
-| 0.1431        | 17.0  | 31620 | 0.5523          |
-| 0.1432        | 18.0  | 33480 | 0.5512          |
-| 0.1423        | 19.0  | 35340 | 0.5519          |
-| 0.1429        | 20.0  | 37200 | 0.5506          |
-| 0.1429        | 21.0  | 39060 | 0.5490          |
-| 0.1441        | 22.0  | 40920 | 0.5477          |
-| 0.1426        | 23.0  | 42780 | 0.5476          |
-| 0.1436        | 24.0  | 44640 | 0.5463          |
-| 0.1419        | 25.0  | 46500 | 0.5462          |
-| 0.1399        | 26.0  | 48360 | 0.5449          |
-| 0.1412        | 27.0  | 50220 | 0.5452          |
-| 0.14          | 28.0  | 52080 | 0.5440          |
-| 0.1396        | 29.0  | 53940 | 0.5440          |
-| 0.1402        | 30.0  | 55800 | 0.5440          |
-| 0.1404        | 31.0  | 57660 | 0.5437          |
-| 0.1415        | 32.0  | 59520 | 0.5427          |
-| 0.1406        | 33.0  | 61380 | 0.5420          |
-| 0.1387        | 34.0  | 63240 | 0.5422          |
-| 0.1392        | 35.0  | 65100 | 0.5420          |
-| 0.1404        | 36.0  | 66960 | 0.5420          |
-| 0.1436        | 37.0  | 68820 | 0.5411          |
-| 0.1424        | 38.0  | 70680 | 0.5415          |
-| 0.141         | 39.0  | 72540 | 0.5407          |
-| 0.1402        | 40.0  | 74400 | 0.5403          |
-| 0.1412        | 41.0  | 76260 | 0.5407          |
-| 0.139         | 42.0  | 78120 | 0.5403          |
-| 0.1357        | 43.0  | 79980 | 0.5401          |
-| 0.1396        | 44.0  | 81840 | 0.5397          |
-| 0.1398        | 45.0  | 83700 | 0.5394          |
-| 0.1385        | 46.0  | 85560 | 0.5395          |
-| 0.1408        | 47.0  | 87420 | 0.5396          |
-| 0.1371        | 48.0  | 89280 | 0.5392          |
-| 0.1418        | 49.0  | 91140 | 0.5393          |
-| 0.1382        | 50.0  | 93000 | 0.5392          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M10KInfSimple
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M10KInfSimple
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.7416
 ## Model description
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
+- num_epochs: 3
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.4261        | 1.0   | 237  | 1.0854          |
+| 0.2515        | 2.0   | 474  | 0.7846          |
+| 0.2148        | 3.0   | 711  | 0.7416          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,13 +1,13 @@
 {
-    "epoch": 50.0,
     "eval_loss": 0.5392394065856934,
     "eval_runtime": 80.8454,
     "eval_samples_per_second": 158.154,
     "eval_steps_per_second": 4.948,
     "perplexity": 1.7147021748977518,
-    "total_flos": 1.950556483878912e+17,
-    "train_loss": 0.14462488290315034,
-    "train_runtime": 9614.4477,
-    "train_samples_per_second": 309.503,
-    "train_steps_per_second": 9.673
 }

 {
+    "epoch": 3.0,
     "eval_loss": 0.5392394065856934,
     "eval_runtime": 80.8454,
     "eval_samples_per_second": 158.154,
     "eval_steps_per_second": 4.948,
     "perplexity": 1.7147021748977518,
+    "total_flos": 1489415748452352.0,
+    "train_loss": 0.3230170006490458,
+    "train_runtime": 99.3661,
+    "train_samples_per_second": 228.669,
+    "train_steps_per_second": 7.155
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 50.0,
-    "total_flos": 1.950556483878912e+17,
-    "train_loss": 0.14462488290315034,
-    "train_runtime": 9614.4477,
-    "train_samples_per_second": 309.503,
-    "train_steps_per_second": 9.673
 }

 {
+    "epoch": 3.0,
+    "total_flos": 1489415748452352.0,
+    "train_loss": 0.3230170006490458,
+    "train_runtime": 99.3661,
+    "train_samples_per_second": 228.669,
+    "train_steps_per_second": 7.155
 }