End of training

Browse files

Files changed (9) hide show

README.md +16 -30
adapter_config.json +5 -5
last-checkpoint/adapter_config.json +5 -5
last-checkpoint/optimizer.pt +2 -2
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +53 -634
last-checkpoint/training_args.bin +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: 818864ef-8094-4357-9a8e-fe11c35169de
   results: []
 ---
@@ -18,12 +18,6 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-accelerate_config:
-  dynamo_backend: inductor
-  mixed_precision: bf16
-  num_machines: 1
-  num_processes: auto
-  use_cpu: false
 adapter: lora
 base_model: echarlaix/tiny-random-mistral
 bf16: auto
@@ -45,7 +39,6 @@ datasets:
     system_prompt: ''
 debug: null
 deepspeed: null
-device_map: auto
 early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
@@ -54,14 +47,16 @@ flash_attention: false
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 16
-gradient_checkpointing: true
 group_by_length: false
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
-learning_rate: 0.0001
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
@@ -70,13 +65,8 @@ lora_fan_in_fan_out: null
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
-lora_target_modules:
-- q_proj
-- v_proj
 lr_scheduler: cosine
-max_memory:
-  0: 70GiB
-max_steps: 100
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/7e930cf543f535d6_train_data.json
 model_type: AutoModelForCausalLM
@@ -84,9 +74,6 @@ num_epochs: 1
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
-quantization_config:
-  llm_int8_enable_fp32_cpu_offload: true
-  load_in_8bit: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
@@ -97,7 +84,6 @@ special_tokens:
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
-torch_compile: true
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
@@ -115,7 +101,7 @@ xformers_attention: null
 </details><br>
-# 818864ef-8094-4357-9a8e-fe11c35169de
 This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
 It achieves the following results on the evaluation set:
@@ -138,25 +124,25 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0001
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- training_steps: 93
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 11974.167     | 0.0108 | 1    | nan             |
-| 0.0           | 0.2598 | 24   | nan             |
-| 0.0           | 0.5196 | 48   | nan             |
-| 0.0           | 0.7794 | 72   | nan             |
 ### Framework versions

 - axolotl
 - generated_from_trainer
 model-index:
+- name: 0c1cc135-b51f-4582-8f62-3de913a18494
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
 adapter: lora
 base_model: echarlaix/tiny-random-mistral
 bf16: auto
     system_prompt: ''
 debug: null
 deepspeed: null
 early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 4
+gradient_checkpointing: false
 group_by_length: false
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
+learning_rate: 0.0002
+load_in_4bit: false
+load_in_8bit: false
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
 lr_scheduler: cosine
+max_steps: 10
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/7e930cf543f535d6_train_data.json
 model_type: AutoModelForCausalLM
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
 </details><br>
+# 0c1cc135-b51f-4582-8f62-3de913a18494
 This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
 It achieves the following results on the evaluation set:
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0002
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- training_steps: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 10.3887       | 0.0027 | 1    | nan             |
+| 0.0           | 0.0081 | 3    | nan             |
+| 0.0           | 0.0162 | 6    | nan             |
+| 0.0           | 0.0244 | 9    | nan             |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "down_proj",
-    "o_proj",
     "gate_proj",
-    "v_proj",
-    "up_proj",
     "q_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "gate_proj",
+    "k_proj",
+    "o_proj",
     "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

last-checkpoint/adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "down_proj",
-    "o_proj",
     "gate_proj",
-    "v_proj",
-    "up_proj",
     "q_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "gate_proj",
+    "k_proj",
+    "o_proj",
     "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:025cb83ae7004cde045c6592ec39ee56d5f5c559b2e01ab2378c4372e4d107ca
-size 71654

 version https://git-lfs.github.com/spec/v1
+oid sha256:bbe10c074c53af62af0964d6b70d2e6528b906557b640a82231880d56e53359d
+size 71718

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d55ef7e39ffed254f36f6faafb6597d5bd8ba2346f104d20c3b8cca9fcde4d05
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:c12066a9c624fe38430ff3feea2dc6451e9f1a920255c11680c737a33d2c53a0
 size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b6fd302dab6dad7411e029f26a1eed4b8af7dfa13be419c5f8a4344ac60a0c3b
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:bb578e75c11a81e85dda67a691f96ba4793a02960f1409fd3e1511aac873491a
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -1,702 +1,121 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 1.006765899864682,
-  "eval_steps": 24,
-  "global_step": 93,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.010825439783491205,
       "grad_norm": NaN,
-      "learning_rate": 1e-05,
-      "loss": 11974.167,
       "step": 1
     },
     {
-      "epoch": 0.010825439783491205,
       "eval_loss": NaN,
-      "eval_runtime": 14.1198,
-      "eval_samples_per_second": 11.048,
-      "eval_steps_per_second": 5.524,
       "step": 1
     },
     {
-      "epoch": 0.02165087956698241,
-      "grad_norm": NaN,
-      "learning_rate": 2e-05,
-      "loss": 0.0,
-      "step": 2
-    },
-    {
-      "epoch": 0.03247631935047361,
-      "grad_norm": NaN,
-      "learning_rate": 3e-05,
-      "loss": 0.0,
-      "step": 3
-    },
-    {
-      "epoch": 0.04330175913396482,
       "grad_norm": NaN,
       "learning_rate": 4e-05,
       "loss": 0.0,
-      "step": 4
-    },
-    {
-      "epoch": 0.05412719891745602,
-      "grad_norm": NaN,
-      "learning_rate": 5e-05,
-      "loss": 0.0,
-      "step": 5
     },
     {
-      "epoch": 0.06495263870094722,
       "grad_norm": NaN,
       "learning_rate": 6e-05,
       "loss": 0.0,
-      "step": 6
     },
     {
-      "epoch": 0.07577807848443843,
-      "grad_norm": NaN,
-      "learning_rate": 7e-05,
-      "loss": 0.0,
-      "step": 7
     },
     {
-      "epoch": 0.08660351826792964,
       "grad_norm": NaN,
       "learning_rate": 8e-05,
       "loss": 0.0,
-      "step": 8
-    },
-    {
-      "epoch": 0.09742895805142084,
-      "grad_norm": NaN,
-      "learning_rate": 9e-05,
-      "loss": 0.0,
-      "step": 9
     },
     {
-      "epoch": 0.10825439783491204,
       "grad_norm": NaN,
       "learning_rate": 0.0001,
       "loss": 0.0,
-      "step": 10
-    },
-    {
-      "epoch": 0.11907983761840325,
-      "grad_norm": NaN,
-      "learning_rate": 9.996418774081658e-05,
-      "loss": 0.0,
-      "step": 11
-    },
-    {
-      "epoch": 0.12990527740189445,
-      "grad_norm": NaN,
-      "learning_rate": 9.985680226398261e-05,
-      "loss": 0.0,
-      "step": 12
-    },
-    {
-      "epoch": 0.14073071718538566,
-      "grad_norm": NaN,
-      "learning_rate": 9.967799739815925e-05,
-      "loss": 0.0,
-      "step": 13
-    },
-    {
-      "epoch": 0.15155615696887687,
-      "grad_norm": NaN,
-      "learning_rate": 9.942802927959443e-05,
-      "loss": 0.0,
-      "step": 14
-    },
-    {
-      "epoch": 0.16238159675236807,
-      "grad_norm": NaN,
-      "learning_rate": 9.910725598521013e-05,
-      "loss": 0.0,
-      "step": 15
-    },
-    {
-      "epoch": 0.17320703653585928,
-      "grad_norm": NaN,
-      "learning_rate": 9.871613701966067e-05,
-      "loss": 0.0,
-      "step": 16
-    },
-    {
-      "epoch": 0.18403247631935046,
-      "grad_norm": NaN,
-      "learning_rate": 9.825523265709666e-05,
-      "loss": 0.0,
-      "step": 17
-    },
-    {
-      "epoch": 0.19485791610284167,
-      "grad_norm": NaN,
-      "learning_rate": 9.772520313857775e-05,
-      "loss": 0.0,
-      "step": 18
-    },
-    {
-      "epoch": 0.20568335588633288,
-      "grad_norm": NaN,
-      "learning_rate": 9.712680772628364e-05,
-      "loss": 0.0,
-      "step": 19
-    },
-    {
-      "epoch": 0.2165087956698241,
-      "grad_norm": NaN,
-      "learning_rate": 9.646090361587827e-05,
-      "loss": 0.0,
-      "step": 20
-    },
-    {
-      "epoch": 0.2273342354533153,
-      "grad_norm": NaN,
-      "learning_rate": 9.572844470858537e-05,
-      "loss": 0.0,
-      "step": 21
-    },
-    {
-      "epoch": 0.2381596752368065,
-      "grad_norm": NaN,
-      "learning_rate": 9.493048024473412e-05,
-      "loss": 0.0,
-      "step": 22
-    },
-    {
-      "epoch": 0.2489851150202977,
-      "grad_norm": NaN,
-      "learning_rate": 9.406815330073244e-05,
-      "loss": 0.0,
-      "step": 23
-    },
-    {
-      "epoch": 0.2598105548037889,
-      "grad_norm": NaN,
-      "learning_rate": 9.314269915162114e-05,
-      "loss": 0.0,
-      "step": 24
-    },
-    {
-      "epoch": 0.2598105548037889,
-      "eval_loss": NaN,
-      "eval_runtime": 0.679,
-      "eval_samples_per_second": 229.759,
-      "eval_steps_per_second": 114.879,
-      "step": 24
-    },
-    {
-      "epoch": 0.2706359945872801,
-      "grad_norm": NaN,
-      "learning_rate": 9.215544350155422e-05,
-      "loss": 0.0,
-      "step": 25
-    },
-    {
-      "epoch": 0.2814614343707713,
-      "grad_norm": NaN,
-      "learning_rate": 9.110780058474052e-05,
-      "loss": 0.0,
-      "step": 26
-    },
-    {
-      "epoch": 0.2922868741542625,
-      "grad_norm": NaN,
-      "learning_rate": 9.000127113956674e-05,
-      "loss": 0.0,
-      "step": 27
-    },
-    {
-      "epoch": 0.30311231393775373,
-      "grad_norm": NaN,
-      "learning_rate": 8.883744025880428e-05,
-      "loss": 0.0,
-      "step": 28
-    },
-    {
-      "epoch": 0.31393775372124494,
-      "grad_norm": NaN,
-      "learning_rate": 8.761797511897906e-05,
-      "loss": 0.0,
-      "step": 29
-    },
-    {
-      "epoch": 0.32476319350473615,
-      "grad_norm": NaN,
-      "learning_rate": 8.634462259215719e-05,
-      "loss": 0.0,
-      "step": 30
-    },
-    {
-      "epoch": 0.33558863328822736,
-      "grad_norm": NaN,
-      "learning_rate": 8.501920674356754e-05,
-      "loss": 0.0,
-      "step": 31
-    },
-    {
-      "epoch": 0.34641407307171856,
-      "grad_norm": NaN,
-      "learning_rate": 8.364362621864595e-05,
-      "loss": 0.0,
-      "step": 32
-    },
-    {
-      "epoch": 0.3572395128552097,
-      "grad_norm": NaN,
-      "learning_rate": 8.221985152324385e-05,
-      "loss": 0.0,
-      "step": 33
-    },
-    {
-      "epoch": 0.3680649526387009,
-      "grad_norm": NaN,
-      "learning_rate": 8.074992220089769e-05,
-      "loss": 0.0,
-      "step": 34
-    },
-    {
-      "epoch": 0.37889039242219213,
-      "grad_norm": NaN,
-      "learning_rate": 7.923594391120236e-05,
-      "loss": 0.0,
-      "step": 35
-    },
-    {
-      "epoch": 0.38971583220568334,
-      "grad_norm": NaN,
-      "learning_rate": 7.768008541347423e-05,
-      "loss": 0.0,
-      "step": 36
-    },
-    {
-      "epoch": 0.40054127198917455,
-      "grad_norm": NaN,
-      "learning_rate": 7.608457546002424e-05,
-      "loss": 0.0,
-      "step": 37
-    },
-    {
-      "epoch": 0.41136671177266576,
-      "grad_norm": NaN,
-      "learning_rate": 7.445169960349167e-05,
-      "loss": 0.0,
-      "step": 38
-    },
-    {
-      "epoch": 0.42219215155615697,
-      "grad_norm": NaN,
-      "learning_rate": 7.278379692281208e-05,
-      "loss": 0.0,
-      "step": 39
-    },
-    {
-      "epoch": 0.4330175913396482,
-      "grad_norm": NaN,
-      "learning_rate": 7.10832566725092e-05,
-      "loss": 0.0,
-      "step": 40
-    },
-    {
-      "epoch": 0.4438430311231394,
-      "grad_norm": NaN,
-      "learning_rate": 6.935251486011087e-05,
-      "loss": 0.0,
-      "step": 41
-    },
-    {
-      "epoch": 0.4546684709066306,
-      "grad_norm": NaN,
-      "learning_rate": 6.759405075659166e-05,
-      "loss": 0.0,
-      "step": 42
-    },
-    {
-      "epoch": 0.4654939106901218,
-      "grad_norm": NaN,
-      "learning_rate": 6.58103833448412e-05,
-      "loss": 0.0,
-      "step": 43
-    },
-    {
-      "epoch": 0.476319350473613,
-      "grad_norm": NaN,
-      "learning_rate": 6.400406771124536e-05,
-      "loss": 0.0,
-      "step": 44
-    },
-    {
-      "epoch": 0.4871447902571042,
-      "grad_norm": NaN,
-      "learning_rate": 6.21776913855496e-05,
-      "loss": 0.0,
-      "step": 45
-    },
-    {
-      "epoch": 0.4979702300405954,
-      "grad_norm": NaN,
-      "learning_rate": 6.0333870634247645e-05,
-      "loss": 0.0,
-      "step": 46
-    },
-    {
-      "epoch": 0.5087956698240866,
-      "grad_norm": NaN,
-      "learning_rate": 5.847524671280484e-05,
-      "loss": 0.0,
-      "step": 47
     },
     {
-      "epoch": 0.5196211096075778,
       "grad_norm": NaN,
-      "learning_rate": 5.660448208208513e-05,
       "loss": 0.0,
-      "step": 48
     },
     {
-      "epoch": 0.5196211096075778,
       "eval_loss": NaN,
-      "eval_runtime": 0.7027,
-      "eval_samples_per_second": 222.007,
-      "eval_steps_per_second": 111.003,
-      "step": 48
-    },
-    {
-      "epoch": 0.530446549391069,
-      "grad_norm": NaN,
-      "learning_rate": 5.472425659440157e-05,
-      "loss": 0.0,
-      "step": 49
-    },
-    {
-      "epoch": 0.5412719891745602,
-      "grad_norm": NaN,
-      "learning_rate": 5.2837263654653715e-05,
-      "loss": 0.0,
-      "step": 50
-    },
-    {
-      "epoch": 0.5520974289580515,
-      "grad_norm": NaN,
-      "learning_rate": 5.094620636205095e-05,
-      "loss": 0.0,
-      "step": 51
-    },
-    {
-      "epoch": 0.5629228687415426,
-      "grad_norm": NaN,
-      "learning_rate": 4.9053793637949067e-05,
-      "loss": 0.0,
-      "step": 52
-    },
-    {
-      "epoch": 0.5737483085250338,
-      "grad_norm": NaN,
-      "learning_rate": 4.7162736345346303e-05,
-      "loss": 0.0,
-      "step": 53
-    },
-    {
-      "epoch": 0.584573748308525,
-      "grad_norm": NaN,
-      "learning_rate": 4.527574340559844e-05,
-      "loss": 0.0,
-      "step": 54
-    },
-    {
-      "epoch": 0.5953991880920162,
-      "grad_norm": NaN,
-      "learning_rate": 4.3395517917914895e-05,
-      "loss": 0.0,
-      "step": 55
-    },
-    {
-      "epoch": 0.6062246278755075,
-      "grad_norm": NaN,
-      "learning_rate": 4.1524753287195165e-05,
-      "loss": 0.0,
-      "step": 56
-    },
-    {
-      "epoch": 0.6170500676589986,
-      "grad_norm": NaN,
-      "learning_rate": 3.966612936575235e-05,
-      "loss": 0.0,
-      "step": 57
-    },
-    {
-      "epoch": 0.6278755074424899,
-      "grad_norm": NaN,
-      "learning_rate": 3.7822308614450406e-05,
-      "loss": 0.0,
-      "step": 58
-    },
-    {
-      "epoch": 0.638700947225981,
-      "grad_norm": NaN,
-      "learning_rate": 3.599593228875465e-05,
-      "loss": 0.0,
-      "step": 59
-    },
-    {
-      "epoch": 0.6495263870094723,
-      "grad_norm": NaN,
-      "learning_rate": 3.41896166551588e-05,
-      "loss": 0.0,
-      "step": 60
-    },
-    {
-      "epoch": 0.6603518267929634,
-      "grad_norm": NaN,
-      "learning_rate": 3.240594924340835e-05,
-      "loss": 0.0,
-      "step": 61
-    },
-    {
-      "epoch": 0.6711772665764547,
-      "grad_norm": NaN,
-      "learning_rate": 3.0647485139889145e-05,
-      "loss": 0.0,
-      "step": 62
-    },
-    {
-      "epoch": 0.6820027063599459,
-      "grad_norm": NaN,
-      "learning_rate": 2.8916743327490803e-05,
-      "loss": 0.0,
-      "step": 63
-    },
-    {
-      "epoch": 0.6928281461434371,
-      "grad_norm": NaN,
-      "learning_rate": 2.721620307718793e-05,
-      "loss": 0.0,
-      "step": 64
-    },
-    {
-      "epoch": 0.7036535859269283,
-      "grad_norm": NaN,
-      "learning_rate": 2.554830039650834e-05,
-      "loss": 0.0,
-      "step": 65
-    },
-    {
-      "epoch": 0.7144790257104194,
-      "grad_norm": NaN,
-      "learning_rate": 2.391542453997578e-05,
-      "loss": 0.0,
-      "step": 66
-    },
-    {
-      "epoch": 0.7253044654939107,
-      "grad_norm": NaN,
-      "learning_rate": 2.2319914586525777e-05,
-      "loss": 0.0,
-      "step": 67
-    },
-    {
-      "epoch": 0.7361299052774019,
-      "grad_norm": NaN,
-      "learning_rate": 2.0764056088797645e-05,
-      "loss": 0.0,
-      "step": 68
-    },
-    {
-      "epoch": 0.7469553450608931,
-      "grad_norm": NaN,
-      "learning_rate": 1.9250077799102322e-05,
-      "loss": 0.0,
-      "step": 69
     },
     {
-      "epoch": 0.7577807848443843,
       "grad_norm": NaN,
-      "learning_rate": 1.7780148476756147e-05,
       "loss": 0.0,
-      "step": 70
     },
     {
-      "epoch": 0.7686062246278755,
       "grad_norm": NaN,
-      "learning_rate": 1.6356373781354058e-05,
       "loss": 0.0,
-      "step": 71
     },
     {
-      "epoch": 0.7794316644113667,
       "grad_norm": NaN,
-      "learning_rate": 1.4980793256432474e-05,
       "loss": 0.0,
-      "step": 72
     },
     {
-      "epoch": 0.7794316644113667,
       "eval_loss": NaN,
-      "eval_runtime": 0.6819,
-      "eval_samples_per_second": 228.779,
-      "eval_steps_per_second": 114.39,
-      "step": 72
-    },
-    {
-      "epoch": 0.790257104194858,
-      "grad_norm": NaN,
-      "learning_rate": 1.3655377407842812e-05,
-      "loss": 0.0,
-      "step": 73
-    },
-    {
-      "epoch": 0.8010825439783491,
-      "grad_norm": NaN,
-      "learning_rate": 1.2382024881020937e-05,
-      "loss": 0.0,
-      "step": 74
-    },
-    {
-      "epoch": 0.8119079837618404,
-      "grad_norm": NaN,
-      "learning_rate": 1.1162559741195733e-05,
-      "loss": 0.0,
-      "step": 75
-    },
-    {
-      "epoch": 0.8227334235453315,
-      "grad_norm": NaN,
-      "learning_rate": 9.998728860433276e-06,
-      "loss": 0.0,
-      "step": 76
-    },
-    {
-      "epoch": 0.8335588633288228,
-      "grad_norm": NaN,
-      "learning_rate": 8.8921994152595e-06,
-      "loss": 0.0,
-      "step": 77
-    },
-    {
-      "epoch": 0.8443843031123139,
-      "grad_norm": NaN,
-      "learning_rate": 7.844556498445788e-06,
-      "loss": 0.0,
-      "step": 78
-    },
-    {
-      "epoch": 0.8552097428958051,
-      "grad_norm": NaN,
-      "learning_rate": 6.857300848378856e-06,
-      "loss": 0.0,
-      "step": 79
-    },
-    {
-      "epoch": 0.8660351826792964,
-      "grad_norm": NaN,
-      "learning_rate": 5.931846699267557e-06,
-      "loss": 0.0,
-      "step": 80
-    },
-    {
-      "epoch": 0.8768606224627875,
-      "grad_norm": NaN,
-      "learning_rate": 5.0695197552659e-06,
-      "loss": 0.0,
-      "step": 81
-    },
-    {
-      "epoch": 0.8876860622462788,
-      "grad_norm": NaN,
-      "learning_rate": 4.271555291414636e-06,
-      "loss": 0.0,
-      "step": 82
-    },
-    {
-      "epoch": 0.8985115020297699,
-      "grad_norm": NaN,
-      "learning_rate": 3.539096384121743e-06,
-      "loss": 0.0,
-      "step": 83
-    },
-    {
-      "epoch": 0.9093369418132612,
-      "grad_norm": NaN,
-      "learning_rate": 2.8731922737163685e-06,
-      "loss": 0.0,
-      "step": 84
-    },
-    {
-      "epoch": 0.9201623815967523,
-      "grad_norm": NaN,
-      "learning_rate": 2.274796861422246e-06,
-      "loss": 0.0,
-      "step": 85
-    },
-    {
-      "epoch": 0.9309878213802436,
-      "grad_norm": NaN,
-      "learning_rate": 1.7447673429033362e-06,
-      "loss": 0.0,
-      "step": 86
-    },
-    {
-      "epoch": 0.9418132611637348,
-      "grad_norm": NaN,
-      "learning_rate": 1.2838629803393342e-06,
-      "loss": 0.0,
-      "step": 87
-    },
-    {
-      "epoch": 0.952638700947226,
-      "grad_norm": NaN,
-      "learning_rate": 8.927440147898702e-07,
-      "loss": 0.0,
-      "step": 88
-    },
-    {
-      "epoch": 0.9634641407307172,
-      "grad_norm": NaN,
-      "learning_rate": 5.719707204055735e-07,
-      "loss": 0.0,
-      "step": 89
-    },
-    {
-      "epoch": 0.9742895805142084,
-      "grad_norm": NaN,
-      "learning_rate": 3.2200260184075406e-07,
-      "loss": 0.0,
-      "step": 90
-    },
-    {
-      "epoch": 0.9851150202976996,
-      "grad_norm": NaN,
-      "learning_rate": 1.431977360173975e-07,
-      "loss": 0.0,
-      "step": 91
-    },
-    {
-      "epoch": 0.9959404600811907,
-      "grad_norm": NaN,
-      "learning_rate": 3.581225918342646e-08,
-      "loss": 0.0,
-      "step": 92
     },
     {
-      "epoch": 1.006765899864682,
       "grad_norm": NaN,
-      "learning_rate": 0.0,
       "loss": 0.0,
-      "step": 93
     }
   ],
   "logging_steps": 1,
-  "max_steps": 93,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 2,
-  "save_steps": 24,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
@@ -709,7 +128,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 13425906401280.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.02706359945872801,
+  "eval_steps": 3,
+  "global_step": 10,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.0027063599458728013,
       "grad_norm": NaN,
+      "learning_rate": 2e-05,
+      "loss": 10.3887,
       "step": 1
     },
     {
+      "epoch": 0.0027063599458728013,
       "eval_loss": NaN,
+      "eval_runtime": 1.7032,
+      "eval_samples_per_second": 91.593,
+      "eval_steps_per_second": 45.797,
       "step": 1
     },
     {
+      "epoch": 0.005412719891745603,
       "grad_norm": NaN,
       "learning_rate": 4e-05,
       "loss": 0.0,
+      "step": 2
     },
     {
+      "epoch": 0.008119079837618403,
       "grad_norm": NaN,
       "learning_rate": 6e-05,
       "loss": 0.0,
+      "step": 3
     },
     {
+      "epoch": 0.008119079837618403,
+      "eval_loss": NaN,
+      "eval_runtime": 0.9381,
+      "eval_samples_per_second": 166.286,
+      "eval_steps_per_second": 83.143,
+      "step": 3
     },
     {
+      "epoch": 0.010825439783491205,
       "grad_norm": NaN,
       "learning_rate": 8e-05,
       "loss": 0.0,
+      "step": 4
     },
     {
+      "epoch": 0.013531799729364006,
       "grad_norm": NaN,
       "learning_rate": 0.0001,
       "loss": 0.0,
+      "step": 5
     },
     {
+      "epoch": 0.016238159675236806,
       "grad_norm": NaN,
+      "learning_rate": 0.00012,
       "loss": 0.0,
+      "step": 6
     },
     {
+      "epoch": 0.016238159675236806,
       "eval_loss": NaN,
+      "eval_runtime": 0.9614,
+      "eval_samples_per_second": 162.261,
+      "eval_steps_per_second": 81.131,
+      "step": 6
     },
     {
+      "epoch": 0.018944519621109608,
       "grad_norm": NaN,
+      "learning_rate": 0.00014,
       "loss": 0.0,
+      "step": 7
     },
     {
+      "epoch": 0.02165087956698241,
       "grad_norm": NaN,
+      "learning_rate": 0.00016,
       "loss": 0.0,
+      "step": 8
     },
     {
+      "epoch": 0.02435723951285521,
       "grad_norm": NaN,
+      "learning_rate": 0.00018,
       "loss": 0.0,
+      "step": 9
     },
     {
+      "epoch": 0.02435723951285521,
       "eval_loss": NaN,
+      "eval_runtime": 0.9501,
+      "eval_samples_per_second": 164.195,
+      "eval_steps_per_second": 82.097,
+      "step": 9
     },
     {
+      "epoch": 0.02706359945872801,
       "grad_norm": NaN,
+      "learning_rate": 0.0002,
       "loss": 0.0,
+      "step": 10
     }
   ],
   "logging_steps": 1,
+  "max_steps": 10,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 3,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
       "attributes": {}
     }
   },
+  "total_flos": 378556022784.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8ccfffad861961f21fb18ee924e1ee63a7d7fa9645e4d98d4830155136777638
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:5d0f25e1494b3ea83333524598e5a16ef9770e57bea863ec546c9ef9e8356f10
 size 6776

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8ccfffad861961f21fb18ee924e1ee63a7d7fa9645e4d98d4830155136777638
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:5d0f25e1494b3ea83333524598e5a16ef9770e57bea863ec546c9ef9e8356f10
 size 6776