hrezaei commited on
Commit
49296ad
·
verified ·
1 Parent(s): 371b0a4

Model save

Browse files
Files changed (3) hide show
  1. README.md +4 -128
  2. model.safetensors +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -2,36 +2,18 @@
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
- datasets:
6
- - HuggingFaceFW/fineweb
7
- metrics:
8
- - accuracy
9
  model-index:
10
  - name: T5LA
11
- results:
12
- - task:
13
- name: Causal Language Modeling
14
- type: text-generation
15
- dataset:
16
- name: HuggingFaceFW/fineweb sample-10BT
17
- type: HuggingFaceFW/fineweb
18
- args: sample-10BT
19
- metrics:
20
- - name: Accuracy
21
- type: accuracy
22
- value: 0.03222989830774154
23
  ---
24
 
25
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
26
  should probably proofread and complete it, then remove this comment. -->
27
 
28
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uoy/llm_training/runs/elf928gg)
29
  # T5LA
30
 
31
- This model is a fine-tuned version of [](https://huggingface.co/) on the HuggingFaceFW/fineweb sample-10BT dataset.
32
- It achieves the following results on the evaluation set:
33
- - Loss: 5.5470
34
- - Accuracy: 0.0322
35
 
36
  ## Model description
37
 
@@ -60,115 +42,9 @@ The following hyperparameters were used during training:
60
  - total_eval_batch_size: 16
61
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
62
  - lr_scheduler_type: linear
63
- - training_steps: 100000
64
  - mixed_precision_training: Native AMP
65
 
66
- ### Training results
67
-
68
- | Training Loss | Epoch | Step | Accuracy | Validation Loss |
69
- |:-------------:|:------:|:------:|:--------:|:---------------:|
70
- | 9.4056 | 0.01 | 1000 | 0.0435 | 9.1215 |
71
- | 8.4062 | 0.02 | 2000 | 0.0443 | 8.1939 |
72
- | 7.7307 | 0.03 | 3000 | 0.0444 | 7.6024 |
73
- | 7.39 | 0.04 | 4000 | 0.0444 | 7.3338 |
74
- | 7.2546 | 0.05 | 5000 | 0.0441 | 7.2452 |
75
- | 7.1985 | 0.06 | 6000 | 0.0369 | 7.1682 |
76
- | 7.1009 | 0.07 | 7000 | 0.0346 | 7.0718 |
77
- | 7.004 | 0.08 | 8000 | 0.0332 | 6.9778 |
78
- | 6.9159 | 0.09 | 9000 | 0.0325 | 6.8964 |
79
- | 6.8548 | 0.1 | 10000 | 0.0325 | 6.8307 |
80
- | 6.7833 | 0.11 | 11000 | 0.0326 | 6.7702 |
81
- | 6.7376 | 0.12 | 12000 | 0.0337 | 6.7163 |
82
- | 6.6821 | 0.13 | 13000 | 0.0346 | 6.6615 |
83
- | 6.6373 | 0.14 | 14000 | 0.0349 | 6.6086 |
84
- | 6.5895 | 0.15 | 15000 | 0.0344 | 6.5569 |
85
- | 6.5421 | 0.16 | 16000 | 0.0354 | 6.5119 |
86
- | 6.5051 | 0.17 | 17000 | 0.0355 | 6.4678 |
87
- | 6.4391 | 0.18 | 18000 | 0.0360 | 6.4324 |
88
- | 6.4242 | 0.19 | 19000 | 0.0355 | 6.4015 |
89
- | 6.3889 | 0.2 | 20000 | 0.0373 | 6.3553 |
90
- | 6.3631 | 0.21 | 21000 | 0.0367 | 6.3285 |
91
- | 6.3296 | 0.22 | 22000 | 0.0369 | 6.3015 |
92
- | 6.3081 | 0.23 | 23000 | 0.0364 | 6.2699 |
93
- | 6.2784 | 0.24 | 24000 | 0.0370 | 6.2454 |
94
- | 6.2589 | 0.25 | 25000 | 0.0374 | 6.2167 |
95
- | 6.2371 | 0.26 | 26000 | 0.0370 | 6.1890 |
96
- | 6.1978 | 0.27 | 27000 | 0.0376 | 6.1660 |
97
- | 6.1895 | 0.28 | 28000 | 0.0375 | 6.1378 |
98
- | 6.1636 | 0.29 | 29000 | 0.0366 | 6.1213 |
99
- | 6.1262 | 0.3 | 30000 | 0.0370 | 6.0967 |
100
- | 6.1345 | 0.31 | 31000 | 0.0361 | 6.0745 |
101
- | 6.1096 | 0.32 | 32000 | 0.0360 | 6.0556 |
102
- | 6.0794 | 0.33 | 33000 | 0.0357 | 6.0413 |
103
- | 6.0643 | 0.34 | 34000 | 0.0363 | 6.0136 |
104
- | 6.057 | 0.35 | 35000 | 0.0362 | 5.9965 |
105
- | 6.0337 | 0.36 | 36000 | 0.0354 | 5.9806 |
106
- | 6.0217 | 0.37 | 37000 | 0.0363 | 5.9584 |
107
- | 6.0045 | 0.38 | 38000 | 0.0359 | 5.9526 |
108
- | 5.9896 | 0.39 | 39000 | 0.0355 | 5.9288 |
109
- | 5.9711 | 0.4 | 40000 | 0.0352 | 5.9152 |
110
- | 5.9629 | 0.41 | 41000 | 0.0349 | 5.8962 |
111
- | 5.9465 | 0.42 | 42000 | 0.0359 | 5.8821 |
112
- | 5.9463 | 0.43 | 43000 | 0.0345 | 5.8692 |
113
- | 5.9317 | 0.44 | 44000 | 0.0343 | 5.8699 |
114
- | 5.9097 | 1.0034 | 45000 | 0.0346 | 5.8483 |
115
- | 5.9107 | 1.0134 | 46000 | 0.0348 | 5.8352 |
116
- | 5.8838 | 1.0234 | 47000 | 0.0343 | 5.8188 |
117
- | 5.887 | 1.0334 | 48000 | 0.0340 | 5.8086 |
118
- | 5.8563 | 1.0434 | 49000 | 0.0338 | 5.7971 |
119
- | 5.8576 | 1.0534 | 50000 | 0.0339 | 5.7968 |
120
- | 5.8567 | 1.0635 | 51000 | 0.0343 | 5.7797 |
121
- | 5.841 | 1.0735 | 52000 | 0.0337 | 5.7677 |
122
- | 5.8192 | 1.0835 | 53000 | 0.0332 | 5.7613 |
123
- | 5.8214 | 1.0935 | 54000 | 0.0338 | 5.7486 |
124
- | 5.8166 | 1.1035 | 55000 | 0.0338 | 5.7409 |
125
- | 5.806 | 1.1135 | 56000 | 0.0333 | 5.7342 |
126
- | 5.7961 | 1.1235 | 57000 | 0.0335 | 5.7236 |
127
- | 5.7847 | 1.1335 | 58000 | 0.0333 | 5.7164 |
128
- | 5.787 | 1.1435 | 59000 | 0.0330 | 5.7096 |
129
- | 5.7711 | 1.1535 | 60000 | 0.0328 | 5.7035 |
130
- | 5.7699 | 1.1635 | 61000 | 0.0331 | 5.6888 |
131
- | 5.763 | 1.1734 | 62000 | 0.0334 | 5.6875 |
132
- | 5.7434 | 1.1835 | 63000 | 0.0330 | 5.6809 |
133
- | 5.7477 | 1.1934 | 64000 | 0.0329 | 5.6686 |
134
- | 5.7409 | 1.2034 | 65000 | 0.0330 | 5.6624 |
135
- | 5.737 | 1.2134 | 66000 | 0.0339 | 5.6758 |
136
- | 5.729 | 1.2234 | 67000 | 0.0326 | 5.6546 |
137
- | 5.7232 | 1.2334 | 68000 | 0.0329 | 5.6467 |
138
- | 5.7127 | 1.2434 | 69000 | 0.0329 | 5.6449 |
139
- | 5.7187 | 1.2534 | 70000 | 0.0329 | 5.6352 |
140
- | 5.717 | 1.2634 | 71000 | 0.0326 | 5.6264 |
141
- | 5.714 | 1.2734 | 72000 | 0.0330 | 5.6219 |
142
- | 5.7079 | 1.2834 | 73000 | 0.0330 | 5.6169 |
143
- | 5.7034 | 1.2934 | 74000 | 0.0326 | 5.6131 |
144
- | 5.6768 | 1.3034 | 75000 | 0.0325 | 5.6125 |
145
- | 5.6955 | 1.3135 | 76000 | 0.0328 | 5.6075 |
146
- | 5.6947 | 1.3235 | 77000 | 0.0325 | 5.6017 |
147
- | 5.7056 | 1.3335 | 78000 | 0.0323 | 5.5956 |
148
- | 5.6636 | 1.3435 | 79000 | 0.0326 | 5.5921 |
149
- | 5.6723 | 1.3535 | 80000 | 0.0326 | 5.5881 |
150
- | 5.659 | 1.3635 | 81000 | 0.0324 | 5.5823 |
151
- | 5.6729 | 1.3735 | 82000 | 0.0326 | 5.5795 |
152
- | 5.6595 | 1.3835 | 83000 | 0.0322 | 5.5794 |
153
- | 5.6565 | 1.3935 | 84000 | 0.0328 | 5.5758 |
154
- | 5.6649 | 1.4034 | 85000 | 0.0325 | 5.5716 |
155
- | 5.6561 | 1.4135 | 86000 | 0.0321 | 5.5695 |
156
- | 5.6405 | 1.4234 | 87000 | 0.0323 | 5.5654 |
157
- | 5.6482 | 1.4335 | 88000 | 0.0321 | 5.5628 |
158
- | 5.6425 | 1.4434 | 89000 | 0.0323 | 5.5622 |
159
- | 5.6379 | 2.0069 | 90000 | 0.0323 | 5.5582 |
160
- | 5.6357 | 2.0169 | 91000 | 0.0322 | 5.5573 |
161
- | 5.6381 | 2.0269 | 92000 | 0.0320 | 5.5568 |
162
- | 5.6427 | 2.0369 | 93000 | 0.0324 | 5.5526 |
163
- | 5.6364 | 2.0469 | 94000 | 0.0323 | 5.5526 |
164
- | 5.626 | 2.0569 | 95000 | 0.0321 | 5.5501 |
165
- | 5.636 | 2.0669 | 96000 | 0.0324 | 5.5492 |
166
- | 5.632 | 2.0769 | 97000 | 0.0323 | 5.5489 |
167
- | 5.6133 | 2.0869 | 98000 | 0.0323 | 5.5479 |
168
- | 5.6291 | 2.0969 | 99000 | 0.0323 | 5.5477 |
169
- | 5.6271 | 2.1069 | 100000 | 0.0322 | 5.5470 |
170
-
171
-
172
  ### Framework versions
173
 
174
  - Transformers 4.49.0.dev0
 
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
 
 
 
 
5
  model-index:
6
  - name: T5LA
7
+ results: []
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
  should probably proofread and complete it, then remove this comment. -->
12
 
13
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uoy/llm_training/runs/pzcq293g)
14
  # T5LA
15
 
16
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 
 
 
17
 
18
  ## Model description
19
 
 
42
  - total_eval_batch_size: 16
43
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
44
  - lr_scheduler_type: linear
45
+ - training_steps: 200000
46
  - mixed_precision_training: Native AMP
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ### Framework versions
49
 
50
  - Transformers 4.49.0.dev0
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:880f5daf0555630aaa1f5971bcbc30735b81263f4959a256d6baedeaab077586
3
  size 439436624
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:558b40774ebb3c775f86849ec49e96e68298f3fc467a56f70324925d9ce915bd
3
  size 439436624
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bbeb34171b3a4acd5b65788b8a72d6a9a1089a87551e1ba44bc0d9baa0db3f6f
3
  size 5432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:250fa1af03ffaa8bda0e2278749102dd8574803e5d42d069bf5be28611ad9412
3
  size 5432