augustocsc
/

Se124M500KInfMinimalist

Generated from Trainer

Model card Files Files and versions

augustocsc commited on May 6, 2025

Commit

61fbfc5

·

verified ·

1 Parent(s): bf638fe

Model save

Files changed (3) hide show

README.md +6 -6
all_results.json +5 -5
train_results.json +5 -5

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M500KInfDelimiter
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M500KInfDelimiter
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5132
 ## Model description
@@ -48,9 +48,9 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 0.1376        | 1.0   | 7890  | 0.5302          |
-| 0.1321        | 2.0   | 15780 | 0.5167          |
-| 0.131         | 3.0   | 23670 | 0.5132          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M500KInfMinimalist
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M500KInfMinimalist
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5700
 ## Model description
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 0.1527        | 1.0   | 7035  | 0.5857          |
+| 0.1485        | 2.0   | 14070 | 0.5736          |
+| 0.1468        | 3.0   | 21105 | 0.5700          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 157.138,
     "eval_steps_per_second": 4.912,
     "perplexity": 1.670611111648889,
-    "total_flos": 4.96443074370601e+16,
-    "train_loss": 0.1418354825324835,
-    "train_runtime": 2430.2542,
-    "train_samples_per_second": 311.637,
-    "train_steps_per_second": 9.74
 }

     "eval_samples_per_second": 157.138,
     "eval_steps_per_second": 4.912,
     "perplexity": 1.670611111648889,
+    "total_flos": 4.426734746743603e+16,
+    "train_loss": 0.15721422936277518,
+    "train_runtime": 2145.6584,
+    "train_samples_per_second": 314.741,
+    "train_steps_per_second": 9.836
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 3.0,
-    "total_flos": 4.96443074370601e+16,
-    "train_loss": 0.1418354825324835,
-    "train_runtime": 2430.2542,
-    "train_samples_per_second": 311.637,
-    "train_steps_per_second": 9.74
 }

 {
     "epoch": 3.0,
+    "total_flos": 4.426734746743603e+16,
+    "train_loss": 0.15721422936277518,
+    "train_runtime": 2145.6584,
+    "train_samples_per_second": 314.741,
+    "train_steps_per_second": 9.836
 }