augustocsc
/

Se124M500KInfKeyValue

Generated from Trainer

Model card Files Files and versions

augustocsc commited on May 6, 2025

Commit

b4d0f79

·

verified ·

1 Parent(s): 703286c

Model save

Files changed (3) hide show

README.md +6 -6
all_results.json +5 -5
train_results.json +5 -5

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M500KInfSimple
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M500KInfSimple
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4881
 ## Model description
@@ -48,9 +48,9 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 0.1297        | 1.0   | 8317  | 0.5045          |
-| 0.1268        | 2.0   | 16634 | 0.4917          |
-| 0.1248        | 3.0   | 24951 | 0.4881          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M500KInfKeyValue
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M500KInfKeyValue
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5125
 ## Model description
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 0.1379        | 1.0   | 7890  | 0.5277          |
+| 0.1328        | 2.0   | 15780 | 0.5159          |
+| 0.1316        | 3.0   | 23670 | 0.5125          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 160.124,
     "eval_steps_per_second": 5.007,
     "perplexity": 1.6292702392482226,
-    "total_flos": 5.233249244912026e+16,
-    "train_loss": 0.13553106638144063,
-    "train_runtime": 2552.5753,
-    "train_samples_per_second": 312.769,
-    "train_steps_per_second": 9.775
 }

     "eval_samples_per_second": 160.124,
     "eval_steps_per_second": 5.007,
     "perplexity": 1.6292702392482226,
+    "total_flos": 4.964450408556134e+16,
+    "train_loss": 0.14119057995788548,
+    "train_runtime": 2419.9093,
+    "train_samples_per_second": 312.97,
+    "train_steps_per_second": 9.781
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 3.0,
-    "total_flos": 5.233249244912026e+16,
-    "train_loss": 0.13553106638144063,
-    "train_runtime": 2552.5753,
-    "train_samples_per_second": 312.769,
-    "train_steps_per_second": 9.775
 }

 {
     "epoch": 3.0,
+    "total_flos": 4.964450408556134e+16,
+    "train_loss": 0.14119057995788548,
+    "train_runtime": 2419.9093,
+    "train_samples_per_second": 312.97,
+    "train_steps_per_second": 9.781
 }