rbelanec commited on
Commit
2c0860b
·
verified ·
1 Parent(s): cd622dd

Model save

Browse files
Files changed (2) hide show
  1. README.md +24 -24
  2. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -17,10 +17,10 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # test
19
 
20
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the wsc dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 0.3508
23
- - Num Input Tokens Seen: 43904
24
 
25
  ## Model description
26
 
@@ -43,7 +43,7 @@ The following hyperparameters were used during training:
43
  - train_batch_size: 2
44
  - eval_batch_size: 2
45
  - seed: 123
46
- - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.1
49
  - num_epochs: 1
@@ -52,31 +52,31 @@ The following hyperparameters were used during training:
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
- | 0.7661 | 0.0522 | 13 | 0.6882 | 2288 |
56
- | 0.6839 | 0.1044 | 26 | 0.4648 | 4656 |
57
- | 0.374 | 0.1566 | 39 | 0.3842 | 6944 |
58
- | 0.3624 | 0.2088 | 52 | 0.3785 | 9232 |
59
- | 0.3164 | 0.2610 | 65 | 0.3669 | 11424 |
60
- | 0.3623 | 0.3133 | 78 | 0.3628 | 13760 |
61
- | 0.3656 | 0.3655 | 91 | 0.3581 | 16048 |
62
- | 0.2954 | 0.4177 | 104 | 0.3806 | 18272 |
63
- | 0.4359 | 0.4699 | 117 | 0.3704 | 20656 |
64
- | 0.356 | 0.5221 | 130 | 0.3525 | 23056 |
65
- | 0.3685 | 0.5743 | 143 | 0.3546 | 25312 |
66
- | 0.3832 | 0.6265 | 156 | 0.3515 | 27552 |
67
- | 0.3202 | 0.6787 | 169 | 0.3524 | 29984 |
68
- | 0.3678 | 0.7309 | 182 | 0.3511 | 32080 |
69
- | 0.3704 | 0.7831 | 195 | 0.3565 | 34176 |
70
- | 0.3651 | 0.8353 | 208 | 0.3508 | 36512 |
71
- | 0.3666 | 0.8876 | 221 | 0.3531 | 38912 |
72
- | 0.3489 | 0.9398 | 234 | 0.3516 | 41120 |
73
- | 0.3405 | 0.9920 | 247 | 0.3515 | 43600 |
74
 
75
 
76
  ### Framework versions
77
 
78
  - PEFT 0.17.1
79
  - Transformers 4.51.3
80
- - Pytorch 2.9.1+cu128
81
  - Datasets 4.0.0
82
  - Tokenizers 0.21.4
 
17
 
18
  # test
19
 
20
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.3559
23
+ - Num Input Tokens Seen: 43600
24
 
25
  ## Model description
26
 
 
43
  - train_batch_size: 2
44
  - eval_batch_size: 2
45
  - seed: 123
46
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.1
49
  - num_epochs: 1
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
+ | 0.7689 | 0.0522 | 13 | 0.6838 | 2288 |
56
+ | 0.6557 | 0.1044 | 26 | 0.4604 | 4656 |
57
+ | 0.3647 | 0.1566 | 39 | 0.3835 | 6944 |
58
+ | 0.3506 | 0.2088 | 52 | 0.3836 | 9232 |
59
+ | 0.3084 | 0.2610 | 65 | 0.3691 | 11424 |
60
+ | 0.3649 | 0.3133 | 78 | 0.3669 | 13760 |
61
+ | 0.3612 | 0.3655 | 91 | 0.3621 | 16048 |
62
+ | 0.2896 | 0.4177 | 104 | 0.3752 | 18272 |
63
+ | 0.4278 | 0.4699 | 117 | 0.3691 | 20656 |
64
+ | 0.3591 | 0.5221 | 130 | 0.3583 | 23056 |
65
+ | 0.3726 | 0.5743 | 143 | 0.3531 | 25312 |
66
+ | 0.3829 | 0.6265 | 156 | 0.3520 | 27552 |
67
+ | 0.3318 | 0.6787 | 169 | 0.3502 | 29984 |
68
+ | 0.3655 | 0.7309 | 182 | 0.3543 | 32080 |
69
+ | 0.3703 | 0.7831 | 195 | 0.3526 | 34176 |
70
+ | 0.3585 | 0.8353 | 208 | 0.3535 | 36512 |
71
+ | 0.3626 | 0.8876 | 221 | 0.3517 | 38912 |
72
+ | 0.3419 | 0.9398 | 234 | 0.3497 | 41120 |
73
+ | 0.3311 | 0.9920 | 247 | 0.3559 | 43600 |
74
 
75
 
76
  ### Framework versions
77
 
78
  - PEFT 0.17.1
79
  - Transformers 4.51.3
80
+ - Pytorch 2.10.0+cu128
81
  - Datasets 4.0.0
82
  - Tokenizers 0.21.4
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7598b118796b1a30bbf291b15efef6a1ddf11a79a29338c929248412495fa19c
3
  size 335717200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e962c186a051742786c79d3b1c1078f44102b5980e7972415ae55fedac6ce56
3
  size 335717200