Mathildeholst commited on
Commit
1ec35ed
·
verified ·
1 Parent(s): 3b26be1

End of training

Browse files
Files changed (5) hide show
  1. README.md +19 -12
  2. config.json +1 -1
  3. generation_config.json +1 -1
  4. model.safetensors +1 -1
  5. training_args.bin +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 3.2215
20
 
21
  ## Model description
22
 
@@ -35,28 +35,35 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.0005
39
  - train_batch_size: 8
40
  - eval_batch_size: 8
41
  - seed: 42
 
 
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
  - lr_scheduler_type: cosine
44
- - num_epochs: 2
 
45
 
46
  ### Training results
47
 
48
- | Training Loss | Epoch | Step | Validation Loss |
49
- |:-------------:|:-----:|:----:|:---------------:|
50
- | 3.1835 | 0.32 | 200 | 3.4083 |
51
- | 2.9042 | 0.64 | 400 | 3.3115 |
52
- | 2.7019 | 0.96 | 600 | 3.1662 |
53
- | 1.6689 | 1.28 | 800 | 3.2652 |
54
- | 1.4683 | 1.6 | 1000 | 3.2215 |
 
 
 
 
55
 
56
 
57
  ### Framework versions
58
 
59
- - Transformers 4.56.1
60
  - Pytorch 2.8.0+cu126
61
  - Datasets 4.0.0
62
- - Tokenizers 0.22.0
 
16
 
17
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 2.4268
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.0003
39
  - train_batch_size: 8
40
  - eval_batch_size: 8
41
  - seed: 42
42
+ - gradient_accumulation_steps: 16
43
+ - total_train_batch_size: 128
44
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.12
47
+ - num_epochs: 1
48
 
49
  ### Training results
50
 
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:------:|:----:|:---------------:|
53
+ | 3.3182 | 0.1067 | 25 | 2.9325 |
54
+ | 2.7389 | 0.2133 | 50 | 2.7294 |
55
+ | 2.6048 | 0.32 | 75 | 2.6356 |
56
+ | 2.4954 | 0.4267 | 100 | 2.5659 |
57
+ | 2.4568 | 0.5333 | 125 | 2.5168 |
58
+ | 2.4627 | 0.64 | 150 | 2.4760 |
59
+ | 2.3904 | 0.7467 | 175 | 2.4465 |
60
+ | 2.3572 | 0.8533 | 200 | 2.4319 |
61
+ | 2.3606 | 0.96 | 225 | 2.4268 |
62
 
63
 
64
  ### Framework versions
65
 
66
+ - Transformers 4.57.1
67
  - Pytorch 2.8.0+cu126
68
  - Datasets 4.0.0
69
+ - Tokenizers 0.22.1
config.json CHANGED
@@ -26,7 +26,7 @@
26
  "rope_scaling": null,
27
  "rope_theta": 100000,
28
  "tie_word_embeddings": true,
29
- "transformers_version": "4.56.2",
30
  "use_cache": true,
31
  "vocab_size": 49152
32
  }
 
26
  "rope_scaling": null,
27
  "rope_theta": 100000,
28
  "tie_word_embeddings": true,
29
+ "transformers_version": "4.57.1",
30
  "use_cache": true,
31
  "vocab_size": 49152
32
  }
generation_config.json CHANGED
@@ -5,5 +5,5 @@
5
  0
6
  ],
7
  "pad_token_id": 0,
8
- "transformers_version": "4.56.1"
9
  }
 
5
  0
6
  ],
7
  "pad_token_id": 0,
8
+ "transformers_version": "4.57.1"
9
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fe9fa69b3e6cff65351370f70a9c11047e66c7e05093e59d5be00526cd46a1f0
3
  size 538090408
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d0bdd355eedf081e8dfd7506fb21267722a0b21f65b532cc0928c0d07d894ef
3
  size 538090408
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7cac7edf1437b9ea62683231187dee3ae140dd997bf52a70d7eadb32f41471b5
3
  size 5777
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86ec7eab07ed7cbc01251e1a5d187a48cce3ab06e46342bf9f1f2ef7e932e7e4
3
  size 5777