DeepDream2045 commited on
Commit
572de99
·
verified ·
1 Parent(s): 50a392c

End of training

Browse files
Files changed (2) hide show
  1. README.md +6 -6
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -74,7 +74,7 @@ optimizer: adamw_torch
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
76
  resume_from_checkpoint: null
77
- s2_attention: false
78
  sample_packing: false
79
  save_steps: 25
80
  sequence_len: 2048
@@ -85,14 +85,14 @@ train_on_inputs: false
85
  trust_remote_code: true
86
  val_set_size: 0.05
87
  wandb_entity: null
88
- wandb_mode: online
89
  wandb_name: 4bcc03c8-df85-4a35-aec2-a35701f2914d
90
  wandb_project: Gradients-On-Demand
91
  wandb_run: your_name
92
  wandb_runid: 4bcc03c8-df85-4a35-aec2-a35701f2914d
93
  warmup_ratio: 0.05
94
  weight_decay: 0.01
95
- xformers_attention: false
96
 
97
  ```
98
 
@@ -102,7 +102,7 @@ xformers_attention: false
102
 
103
  This model is a fine-tuned version of [Xenova/tiny-random-Phi3ForCausalLM](https://huggingface.co/Xenova/tiny-random-Phi3ForCausalLM) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
- - Loss: 10.3641
106
 
107
  ## Model description
108
 
@@ -137,8 +137,8 @@ The following hyperparameters were used during training:
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
  | 10.3798 | 0.0007 | 1 | 10.3799 |
140
- | 10.3664 | 0.0164 | 25 | 10.3686 |
141
- | 10.3805 | 0.0328 | 50 | 10.3641 |
142
 
143
 
144
  ### Framework versions
 
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
76
  resume_from_checkpoint: null
77
+ s2_attention: null
78
  sample_packing: false
79
  save_steps: 25
80
  sequence_len: 2048
 
85
  trust_remote_code: true
86
  val_set_size: 0.05
87
  wandb_entity: null
88
+ wandb_mode: disabled
89
  wandb_name: 4bcc03c8-df85-4a35-aec2-a35701f2914d
90
  wandb_project: Gradients-On-Demand
91
  wandb_run: your_name
92
  wandb_runid: 4bcc03c8-df85-4a35-aec2-a35701f2914d
93
  warmup_ratio: 0.05
94
  weight_decay: 0.01
95
+ xformers_attention: true
96
 
97
  ```
98
 
 
102
 
103
  This model is a fine-tuned version of [Xenova/tiny-random-Phi3ForCausalLM](https://huggingface.co/Xenova/tiny-random-Phi3ForCausalLM) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 10.3652
106
 
107
  ## Model description
108
 
 
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
  | 10.3798 | 0.0007 | 1 | 10.3799 |
140
+ | 10.3674 | 0.0164 | 25 | 10.3696 |
141
+ | 10.379 | 0.0328 | 50 | 10.3652 |
142
 
143
 
144
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:01fe1ee7b38326f2ab85001f6a79848d60870e11682e4a021beb930f0afda060
3
  size 120926
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4115140bc4c389927425c6be28ec28faaa8660b6e3f0b87bf5b74de8b7566188
3
  size 120926