End of training

Files changed (6) hide show

README.md CHANGED Viewed

@@ -4,12 +4,25 @@ base_model: hrezaei/flan-t5laa-large
 tags:
 - generated_from_trainer
 datasets:
-- generator
 metrics:
 - accuracy
 model-index:
 - name: flan-t5laa-large
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,13 +30,13 @@ should probably proofread and complete it, then remove this comment. -->
 # flan-t5laa-large
-This model is a fine-tuned version of [hrezaei/flan-t5laa-large](https://huggingface.co/hrezaei/flan-t5laa-large) on the generator dataset.
 It achieves the following results on the evaluation set:
 - Perplexity: 1.1522
 - Loss: 0.1417
 - Accuracy: 0.0025
-- Lookahead Perplexity: 524.0447
-- Lookahead Loss: 6.2616
 - Base Perplexity: 1.1386
 - Base Loss: 0.1298

 tags:
 - generated_from_trainer
 datasets:
+- HuggingFaceFW/fineweb
 metrics:
 - accuracy
 model-index:
 - name: flan-t5laa-large
+  results:
+  - task:
+      name: Causal Language Modeling
+      type: text-generation
+    dataset:
+      name: HuggingFaceFW/fineweb sample-350BT
+      type: HuggingFaceFW/fineweb
+      config: default
+      split: train
+      args: sample-350BT
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.002520743639921722
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # flan-t5laa-large
+This model is a fine-tuned version of [hrezaei/flan-t5laa-large](https://huggingface.co/hrezaei/flan-t5laa-large) on the HuggingFaceFW/fineweb sample-350BT dataset.
 It achieves the following results on the evaluation set:
 - Perplexity: 1.1522
 - Loss: 0.1417
 - Accuracy: 0.0025
+- Lookahead Perplexity: 524.0285
+- Lookahead Loss: 6.2615
 - Base Perplexity: 1.1386
 - Base Loss: 0.1298

all_results.json ADDED Viewed

+{
+    "eval_accuracy": 0.002520743639921722,
+    "eval_base_loss": 0.1298022617856725,
+    "eval_lookahead_loss": 6.2615461227611995,
+    "eval_loss": 0.14166302978992462,
+    "eval_perplexity": 1.1521883299711884,
+    "eval_runtime": 270.28,
+    "eval_samples": 10000,
+    "eval_samples_per_second": 18.499,
+    "eval_steps_per_second": 0.581,
+    "total_flos": 4.036319640756106e+19,
+    "train_loss": 0.07611130246368703,
+    "train_runtime": 105022.8966,
+    "train_samples": 2000000,
+    "train_samples_per_second": 159.748,
+    "train_steps_per_second": 4.992
+}

eval_results.json ADDED Viewed

+{
+    "eval_accuracy": 0.002520743639921722,
+    "eval_base_loss": 0.1298022617856725,
+    "eval_lookahead_loss": 6.2615461227611995,
+    "eval_loss": 0.14166302978992462,
+    "eval_perplexity": 1.1521883299711884,
+    "eval_runtime": 270.28,
+    "eval_samples": 10000,
+    "eval_samples_per_second": 18.499,
+    "eval_steps_per_second": 0.581
+}

runs/Nov24_21-11-41_gpu23.viking2.yor.alces.network/events.out.tfevents.1764124471.gpu23.viking2.yor.alces.network.3316656.1 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:678c4b1269387ce0a9e48e414e6832397607f0ff4606f31b4a824d7cabfda650
+size 710

train_results.json ADDED Viewed

+{
+    "total_flos": 4.036319640756106e+19,
+    "train_loss": 0.07611130246368703,
+    "train_runtime": 105022.8966,
+    "train_samples": 2000000,
+    "train_samples_per_second": 159.748,
+    "train_steps_per_second": 4.992
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff