SystemAdmin123 commited on
Commit
84524e2
·
verified ·
1 Parent(s): c38e28b

End of training

Browse files
Files changed (1) hide show
  1. README.md +15 -4
README.md CHANGED
@@ -37,7 +37,7 @@ datasets:
37
  system_prompt: ''
38
  device_map: auto
39
  eval_sample_packing: false
40
- eval_steps: 200
41
  flash_attention: true
42
  gradient_checkpointing: true
43
  group_by_length: true
@@ -55,11 +55,13 @@ output_dir: /root/.sn56/axolotl/tmp/SmolLM-360M
55
  pad_to_sequence_len: true
56
  resize_token_embeddings_to_32x: false
57
  sample_packing: true
58
- save_steps: 200
59
  save_total_limit: 1
60
  sequence_len: 2048
61
  tokenizer_type: GPT2TokenizerFast
62
  torch_dtype: bf16
 
 
63
  trust_remote_code: true
64
  val_set_size: 0.1
65
  wandb_entity: ''
@@ -78,7 +80,7 @@ warmup_ratio: 0.05
78
 
79
  This model is a fine-tuned version of [unsloth/SmolLM-360M](https://huggingface.co/unsloth/SmolLM-360M) on the argilla/databricks-dolly-15k-curated-en dataset.
80
  It achieves the following results on the evaluation set:
81
- - Loss: 2.0617
82
 
83
  ## Model description
84
 
@@ -115,7 +117,16 @@ The following hyperparameters were used during training:
115
  | Training Loss | Epoch | Step | Validation Loss |
116
  |:-------------:|:-----:|:----:|:---------------:|
117
  | No log | 0.125 | 1 | 2.5584 |
118
- | 2.011 | 25.0 | 200 | 2.0617 |
 
 
 
 
 
 
 
 
 
119
 
120
 
121
  ### Framework versions
 
37
  system_prompt: ''
38
  device_map: auto
39
  eval_sample_packing: false
40
+ eval_steps: 20
41
  flash_attention: true
42
  gradient_checkpointing: true
43
  group_by_length: true
 
55
  pad_to_sequence_len: true
56
  resize_token_embeddings_to_32x: false
57
  sample_packing: true
58
+ save_steps: 20
59
  save_total_limit: 1
60
  sequence_len: 2048
61
  tokenizer_type: GPT2TokenizerFast
62
  torch_dtype: bf16
63
+ training_args_kwargs:
64
+ hub_private_repo: true
65
  trust_remote_code: true
66
  val_set_size: 0.1
67
  wandb_entity: ''
 
80
 
81
  This model is a fine-tuned version of [unsloth/SmolLM-360M](https://huggingface.co/unsloth/SmolLM-360M) on the argilla/databricks-dolly-15k-curated-en dataset.
82
  It achieves the following results on the evaluation set:
83
+ - Loss: 2.0673
84
 
85
  ## Model description
86
 
 
117
  | Training Loss | Epoch | Step | Validation Loss |
118
  |:-------------:|:-----:|:----:|:---------------:|
119
  | No log | 0.125 | 1 | 2.5584 |
120
+ | 2.2406 | 2.5 | 20 | 2.1562 |
121
+ | 2.136 | 5.0 | 40 | 2.0829 |
122
+ | 2.0938 | 7.5 | 60 | 2.0711 |
123
+ | 2.0632 | 10.0 | 80 | 2.0679 |
124
+ | 2.0298 | 12.5 | 100 | 2.0621 |
125
+ | 2.0168 | 15.0 | 120 | 2.0567 |
126
+ | 2.0188 | 17.5 | 140 | 2.0686 |
127
+ | 2.0108 | 20.0 | 160 | 2.0701 |
128
+ | 2.0169 | 22.5 | 180 | 2.0683 |
129
+ | 2.0109 | 25.0 | 200 | 2.0673 |
130
 
131
 
132
  ### Framework versions