augustocsc
/

Se124M500KInfSimple

augustocsc commited on May 7, 2025

Commit

07ec7f8

verified ·

1 Parent(s): 14e0e3d

Model save

Files changed (3) hide show

README.md CHANGED Viewed

@@ -5,18 +5,18 @@ base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
-- name: Se124M500KInfMinimalist
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Se124M500KInfMinimalist
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5700
 ## Model description
@@ -36,8 +36,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
@@ -48,9 +48,9 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 0.1527        | 1.0   | 7035  | 0.5857          |
-| 0.1485        | 2.0   | 14070 | 0.5736          |
-| 0.1468        | 3.0   | 21105 | 0.5700          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Se124M500KInfSimple
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Se124M500KInfSimple
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4813
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 24
+- eval_batch_size: 24
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 0.1713        | 1.0   | 11089 | 0.4982          |
+| 0.1659        | 2.0   | 22178 | 0.4854          |
+| 0.1676        | 3.0   | 33267 | 0.4813          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -5,9 +5,9 @@
     "eval_samples_per_second": 160.628,
     "eval_steps_per_second": 5.022,
     "perplexity": 1.7682359469654831,
-    "total_flos": 4.426734746743603e+16,
-    "train_loss": 0.15721422936277518,
-    "train_runtime": 2145.6584,
-    "train_samples_per_second": 314.741,
-    "train_steps_per_second": 9.836
 }

     "eval_samples_per_second": 160.628,
     "eval_steps_per_second": 5.022,
     "perplexity": 1.7682359469654831,
+    "total_flos": 5.233249244912026e+16,
+    "train_loss": 0.17665502242223255,
+    "train_runtime": 3282.9562,
+    "train_samples_per_second": 243.185,
+    "train_steps_per_second": 10.133
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 3.0,
-    "total_flos": 4.426734746743603e+16,
-    "train_loss": 0.15721422936277518,
-    "train_runtime": 2145.6584,
-    "train_samples_per_second": 314.741,
-    "train_steps_per_second": 9.836
 }

 {
     "epoch": 3.0,
+    "total_flos": 5.233249244912026e+16,
+    "train_loss": 0.17665502242223255,
+    "train_runtime": 3282.9562,
+    "train_samples_per_second": 243.185,
+    "train_steps_per_second": 10.133
 }