rbelanec commited on
Commit
1413a12
·
verified ·
1 Parent(s): dadf356

Model save

Browse files
Files changed (2) hide show
  1. README.md +26 -24
  2. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -17,10 +17,10 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # test
19
 
20
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the wsc dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 0.3459
23
- - Num Input Tokens Seen: 49376
24
 
25
  ## Model description
26
 
@@ -40,8 +40,8 @@ More information needed
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 5e-05
43
- - train_batch_size: 4
44
- - eval_batch_size: 4
45
  - seed: 123
46
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: cosine
@@ -50,25 +50,27 @@ The following hyperparameters were used during training:
50
 
51
  ### Training results
52
 
53
- | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
- |:-------------:|:-----:|:----:|:---------------:|:-----------------:|
55
- | 10.9709 | 0.056 | 7 | 6.5227 | 2880 |
56
- | 6.4075 | 0.112 | 14 | 1.3825 | 5920 |
57
- | 0.5326 | 0.168 | 21 | 0.4987 | 8416 |
58
- | 0.4144 | 0.224 | 28 | 0.4531 | 11264 |
59
- | 0.4802 | 0.28 | 35 | 0.3693 | 13824 |
60
- | 0.3809 | 0.336 | 42 | 0.3873 | 16672 |
61
- | 0.3844 | 0.392 | 49 | 0.3778 | 19296 |
62
- | 0.3831 | 0.448 | 56 | 0.4437 | 22432 |
63
- | 0.5576 | 0.504 | 63 | 0.3503 | 25504 |
64
- | 0.3242 | 0.56 | 70 | 0.3716 | 28064 |
65
- | 0.3963 | 0.616 | 77 | 0.3749 | 30720 |
66
- | 0.3946 | 0.672 | 84 | 0.3604 | 33504 |
67
- | 0.337 | 0.728 | 91 | 0.3571 | 36128 |
68
- | 0.4315 | 0.784 | 98 | 0.3520 | 38592 |
69
- | 0.371 | 0.84 | 105 | 0.3476 | 41280 |
70
- | 0.364 | 0.896 | 112 | 0.3459 | 44160 |
71
- | 0.3554 | 0.952 | 119 | 0.3492 | 46944 |
 
 
72
 
73
 
74
  ### Framework versions
 
17
 
18
  # test
19
 
20
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.3516
23
+ - Num Input Tokens Seen: 43600
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 5e-05
43
+ - train_batch_size: 2
44
+ - eval_batch_size: 2
45
  - seed: 123
46
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: cosine
 
50
 
51
  ### Training results
52
 
53
+ | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
+ |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
+ | 0.7462 | 0.0522 | 13 | 0.6849 | 2288 |
56
+ | 0.6639 | 0.1044 | 26 | 0.4557 | 4656 |
57
+ | 0.3742 | 0.1566 | 39 | 0.3849 | 6944 |
58
+ | 0.3565 | 0.2088 | 52 | 0.3768 | 9232 |
59
+ | 0.3087 | 0.2610 | 65 | 0.3713 | 11424 |
60
+ | 0.3607 | 0.3133 | 78 | 0.3614 | 13760 |
61
+ | 0.3589 | 0.3655 | 91 | 0.3609 | 16048 |
62
+ | 0.2898 | 0.4177 | 104 | 0.3723 | 18272 |
63
+ | 0.4246 | 0.4699 | 117 | 0.3699 | 20656 |
64
+ | 0.3657 | 0.5221 | 130 | 0.3523 | 23056 |
65
+ | 0.3637 | 0.5743 | 143 | 0.3551 | 25312 |
66
+ | 0.3938 | 0.6265 | 156 | 0.3517 | 27552 |
67
+ | 0.3198 | 0.6787 | 169 | 0.3546 | 29984 |
68
+ | 0.369 | 0.7309 | 182 | 0.3491 | 32080 |
69
+ | 0.3673 | 0.7831 | 195 | 0.3541 | 34176 |
70
+ | 0.3675 | 0.8353 | 208 | 0.3513 | 36512 |
71
+ | 0.3634 | 0.8876 | 221 | 0.3547 | 38912 |
72
+ | 0.3446 | 0.9398 | 234 | 0.3519 | 41120 |
73
+ | 0.3364 | 0.9920 | 247 | 0.3516 | 43600 |
74
 
75
 
76
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7163bafc9ea57b4043ed3b6279d4db1efadabb33e78a4e1b957e38283b5e59fe
3
  size 335717200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4304744297050635f73d3344a18cd231f441a5afcdd8857ed1683f6664a31cf9
3
  size 335717200