End of training

Browse files

Files changed (3) hide show

README.md +51 -24
adapter_model.bin +1 -1
adapter_model.safetensors +1 -1

README.md CHANGED Viewed

@@ -18,10 +18,10 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-adapter: qlora
-auto_resume_from_checkpoints: true
 base_model: fxmarty/tiny-random-GemmaForCausalLM
-bf16: auto
 chat_template: llama3
 dataset_prepared_path: null
 datasets:
@@ -40,23 +40,23 @@ datasets:
     system_prompt: ''
 debug: null
 deepspeed: null
-early_stopping_patience: 4
 eval_max_new_tokens: 128
-eval_steps: 100
 eval_table_size: null
 flash_attention: true
-fp16: false
 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
-gradient_checkpointing: true
 group_by_length: false
 hub_model_id: error577/8a76346a-e5e1-4372-8a33-4ae45d89359b
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
 learning_rate: 0.0002
-load_in_4bit: true
 load_in_8bit: false
 local_rank: null
 logging_steps: 1
@@ -72,14 +72,14 @@ max_steps: null
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/95621c23f229fe74_train_data.json
 model_type: AutoModelForCausalLM
-num_epochs: 3
-optimizer: adamw_torch_4bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
-save_steps: 100
 sequence_len: 512
 strict: false
 tf32: false
@@ -93,7 +93,7 @@ wandb_name: e75973b3-c17e-44e4-b527-21c602afd6c4
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: e75973b3-c17e-44e4-b527-21c602afd6c4
-warmup_steps: 30
 weight_decay: 0.0
 xformers_attention: null
@@ -105,7 +105,7 @@ xformers_attention: null
 This model is a fine-tuned version of [fxmarty/tiny-random-GemmaForCausalLM](https://huggingface.co/fxmarty/tiny-random-GemmaForCausalLM) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: nan
 ## Model description
@@ -130,21 +130,48 @@ The following hyperparameters were used during training:
 - seed: 42
 - gradient_accumulation_steps: 4
 - total_train_batch_size: 8
-- optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 30
-- num_epochs: 3
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.0           | 0.0002 | 1    | nan             |
-| 0.0           | 0.0164 | 100  | nan             |
-| 0.0           | 0.0327 | 200  | nan             |
-| 0.0           | 0.0491 | 300  | nan             |
-| 0.0           | 0.0655 | 400  | nan             |
-| 0.0           | 0.0819 | 500  | nan             |
 ### Framework versions

 axolotl version: `0.4.1`
 ```yaml
+adapter: lora
+auto_resume_from_checkpoints: false
 base_model: fxmarty/tiny-random-GemmaForCausalLM
+bf16: false
 chat_template: llama3
 dataset_prepared_path: null
 datasets:
     system_prompt: ''
 debug: null
 deepspeed: null
+early_stopping_patience: 3
 eval_max_new_tokens: 128
+eval_steps: 1000
 eval_table_size: null
 flash_attention: true
+fp16: true
 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
+gradient_checkpointing: false
 group_by_length: false
 hub_model_id: error577/8a76346a-e5e1-4372-8a33-4ae45d89359b
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
 learning_rate: 0.0002
+load_in_4bit: false
 load_in_8bit: false
 local_rank: null
 logging_steps: 1
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/95621c23f229fe74_train_data.json
 model_type: AutoModelForCausalLM
+num_epochs: 10
+optimizer: adamw_torch
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
+save_steps: 1000
 sequence_len: 512
 strict: false
 tf32: false
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: e75973b3-c17e-44e4-b527-21c602afd6c4
+warmup_steps: 300
 weight_decay: 0.0
 xformers_attention: null
 This model is a fine-tuned version of [fxmarty/tiny-random-GemmaForCausalLM](https://huggingface.co/fxmarty/tiny-random-GemmaForCausalLM) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 12.1613
 ## Model description
 - seed: 42
 - gradient_accumulation_steps: 4
 - total_train_batch_size: 8
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 300
+- num_epochs: 10
+- mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 12.4513       | 0.0002 | 1     | 12.4418         |
+| 12.2591       | 0.1637 | 1000  | 12.2520         |
+| 12.2403       | 0.3275 | 2000  | 12.2224         |
+| 12.2069       | 0.4912 | 3000  | 12.2032         |
+| 12.1813       | 0.6550 | 4000  | 12.1945         |
+| 12.2163       | 0.8187 | 5000  | 12.1882         |
+| 12.1597       | 0.9825 | 6000  | 12.1822         |
+| 12.2022       | 1.1462 | 7000  | 12.1761         |
+| 12.2427       | 1.3100 | 8000  | 12.1720         |
+| 12.1622       | 1.4737 | 9000  | 12.1691         |
+| 12.2151       | 1.6375 | 10000 | 12.1676         |
+| 12.18         | 1.8012 | 11000 | 12.1669         |
+| 12.1537       | 1.9650 | 12000 | 12.1656         |
+| 12.1634       | 2.1287 | 13000 | 12.1650         |
+| 12.2148       | 2.2925 | 14000 | 12.1649         |
+| 12.1868       | 2.4562 | 15000 | 12.1646         |
+| 12.1903       | 2.6199 | 16000 | 12.1642         |
+| 12.1781       | 2.7837 | 17000 | 12.1643         |
+| 12.1894       | 2.9474 | 18000 | 12.1638         |
+| 12.2065       | 3.1112 | 19000 | 12.1633         |
+| 12.1887       | 3.2749 | 20000 | 12.1635         |
+| 12.1549       | 3.4387 | 21000 | 12.1626         |
+| 12.1719       | 3.6024 | 22000 | 12.1624         |
+| 12.2151       | 3.7662 | 23000 | 12.1626         |
+| 12.157        | 3.9299 | 24000 | 12.1629         |
+| 12.1682       | 4.0937 | 25000 | 12.1619         |
+| 12.1968       | 4.2574 | 26000 | 12.1619         |
+| 12.1651       | 4.4212 | 27000 | 12.1617         |
+| 12.168        | 4.5849 | 28000 | 12.1612         |
+| 12.1713       | 4.7486 | 29000 | 12.1617         |
+| 12.1767       | 4.9124 | 30000 | 12.1614         |
+| 12.2027       | 5.0761 | 31000 | 12.1613         |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d30d35f390d6930b48fb55590f138015c36014271463949d7691f3871bc705b
 size 76696

 version https://git-lfs.github.com/spec/v1
+oid sha256:7845f6e98f4180b9c1bdf45b14bbd54e8d1ed1f886df7b416d91d1676a26ac4d
 size 76696

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5e31ab3cf726907124f36b62202b35bd1d2de4320ef735f47c1d3589bc486329
 size 72936

 version https://git-lfs.github.com/spec/v1
+oid sha256:7a0040a69c05456dd3555fc688eedb7c3608bc71c0a16aebc4a707d9eafcb450
 size 72936