ak2603 commited on
Commit
db88dad
·
verified ·
1 Parent(s): e9d228d

Fine tuning: 0.1 test size, 60 epochs, 12 batch size

Browse files
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/mt5-small
5
+ tags:
6
+ - summarization
7
+ - generated_from_trainer
8
+ metrics:
9
+ - rouge
10
+ model-index:
11
+ - name: mt5-small-finetuned
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # mt5-small-finetuned
19
+
20
+ This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the None dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 2.0192
23
+ - Rouge1: 0.3780
24
+ - Rouge2: 0.1970
25
+ - Rougel: 0.3508
26
+ - Rougelsum: 0.3527
27
+
28
+ ## Model description
29
+
30
+ More information needed
31
+
32
+ ## Intended uses & limitations
33
+
34
+ More information needed
35
+
36
+ ## Training and evaluation data
37
+
38
+ More information needed
39
+
40
+ ## Training procedure
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 5.6e-05
46
+ - train_batch_size: 12
47
+ - eval_batch_size: 12
48
+ - seed: 42
49
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
+ - lr_scheduler_type: linear
51
+ - num_epochs: 60
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
56
+ |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
57
+ | 21.4055 | 1.0 | 12 | 13.7151 | 0.0189 | 0.0084 | 0.0147 | 0.0189 |
58
+ | 17.1792 | 2.0 | 24 | 11.5227 | 0.0189 | 0.0084 | 0.0147 | 0.0189 |
59
+ | 15.0485 | 3.0 | 36 | 9.5193 | 0.0 | 0.0 | 0.0 | 0.0 |
60
+ | 13.0405 | 4.0 | 48 | 6.8529 | 0.0102 | 0.0 | 0.0102 | 0.0102 |
61
+ | 11.7418 | 5.0 | 60 | 5.8151 | 0.0331 | 0.0084 | 0.0303 | 0.0335 |
62
+ | 9.659 | 6.0 | 72 | 5.6024 | 0.0344 | 0.0084 | 0.0344 | 0.0357 |
63
+ | 8.6025 | 7.0 | 84 | 4.7311 | 0.0273 | 0.0036 | 0.0269 | 0.0277 |
64
+ | 7.5747 | 8.0 | 96 | 3.8319 | 0.0510 | 0.0031 | 0.0483 | 0.0456 |
65
+ | 6.916 | 9.0 | 108 | 3.5873 | 0.0578 | 0.0 | 0.0540 | 0.0520 |
66
+ | 6.3394 | 10.0 | 120 | 3.4854 | 0.0788 | 0.0076 | 0.0792 | 0.0794 |
67
+ | 5.5822 | 11.0 | 132 | 3.2956 | 0.0752 | 0.0158 | 0.0694 | 0.0697 |
68
+ | 5.0731 | 12.0 | 144 | 3.0977 | 0.0524 | 0.0115 | 0.0470 | 0.0475 |
69
+ | 4.7234 | 13.0 | 156 | 2.9120 | 0.0331 | 0.0105 | 0.0279 | 0.0285 |
70
+ | 4.3512 | 14.0 | 168 | 2.7709 | 0.0527 | 0.0304 | 0.0375 | 0.0377 |
71
+ | 4.136 | 15.0 | 180 | 2.6770 | 0.0616 | 0.0331 | 0.0494 | 0.0495 |
72
+ | 3.8591 | 16.0 | 192 | 2.5894 | 0.1028 | 0.0473 | 0.0817 | 0.0826 |
73
+ | 3.6558 | 17.0 | 204 | 2.5183 | 0.1814 | 0.0828 | 0.1541 | 0.1532 |
74
+ | 3.4821 | 18.0 | 216 | 2.4590 | 0.1940 | 0.0838 | 0.1618 | 0.1621 |
75
+ | 3.3248 | 19.0 | 228 | 2.3901 | 0.2062 | 0.0856 | 0.1667 | 0.1676 |
76
+ | 3.194 | 20.0 | 240 | 2.3352 | 0.1971 | 0.0918 | 0.1672 | 0.1684 |
77
+ | 3.0883 | 21.0 | 252 | 2.2934 | 0.1971 | 0.0918 | 0.1672 | 0.1684 |
78
+ | 2.9907 | 22.0 | 264 | 2.2471 | 0.2039 | 0.0943 | 0.1660 | 0.1675 |
79
+ | 2.9249 | 23.0 | 276 | 2.2038 | 0.1904 | 0.0843 | 0.1515 | 0.1537 |
80
+ | 2.8418 | 24.0 | 288 | 2.1643 | 0.1995 | 0.0939 | 0.1686 | 0.1705 |
81
+ | 2.7667 | 25.0 | 300 | 2.1296 | 0.2233 | 0.1002 | 0.1882 | 0.1890 |
82
+ | 2.7157 | 26.0 | 312 | 2.1176 | 0.3513 | 0.1825 | 0.3422 | 0.3432 |
83
+ | 2.7058 | 27.0 | 324 | 2.0969 | 0.3525 | 0.1803 | 0.3444 | 0.3457 |
84
+ | 2.5703 | 28.0 | 336 | 2.0761 | 0.3507 | 0.1847 | 0.3395 | 0.3409 |
85
+ | 2.4907 | 29.0 | 348 | 2.0688 | 0.3379 | 0.1741 | 0.3281 | 0.3290 |
86
+ | 2.3974 | 30.0 | 360 | 2.0706 | 0.3520 | 0.1872 | 0.3391 | 0.3402 |
87
+ | 2.4584 | 31.0 | 372 | 2.0635 | 0.3465 | 0.1840 | 0.3332 | 0.3344 |
88
+ | 2.3775 | 32.0 | 384 | 2.0560 | 0.3525 | 0.1826 | 0.3390 | 0.3411 |
89
+ | 2.4014 | 33.0 | 396 | 2.0544 | 0.3585 | 0.1860 | 0.3456 | 0.3469 |
90
+ | 2.3388 | 34.0 | 408 | 2.0583 | 0.3607 | 0.1865 | 0.3483 | 0.3496 |
91
+ | 2.3288 | 35.0 | 420 | 2.0487 | 0.3551 | 0.1835 | 0.3368 | 0.3379 |
92
+ | 2.3233 | 36.0 | 432 | 2.0394 | 0.3569 | 0.1803 | 0.3313 | 0.3326 |
93
+ | 2.2882 | 37.0 | 444 | 2.0361 | 0.3585 | 0.1867 | 0.3422 | 0.3446 |
94
+ | 2.2109 | 38.0 | 456 | 2.0324 | 0.3565 | 0.1858 | 0.3413 | 0.3429 |
95
+ | 2.212 | 39.0 | 468 | 2.0327 | 0.3585 | 0.1867 | 0.3422 | 0.3446 |
96
+ | 2.2059 | 40.0 | 480 | 2.0310 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
97
+ | 2.1866 | 41.0 | 492 | 2.0352 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
98
+ | 2.2122 | 42.0 | 504 | 2.0369 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
99
+ | 2.1305 | 43.0 | 516 | 2.0351 | 0.3604 | 0.1863 | 0.3419 | 0.3443 |
100
+ | 2.1174 | 44.0 | 528 | 2.0358 | 0.3578 | 0.1864 | 0.3397 | 0.3413 |
101
+ | 2.0972 | 45.0 | 540 | 2.0356 | 0.3602 | 0.1881 | 0.3390 | 0.3405 |
102
+ | 2.1051 | 46.0 | 552 | 2.0325 | 0.3606 | 0.1861 | 0.3359 | 0.3376 |
103
+ | 2.0632 | 47.0 | 564 | 2.0329 | 0.3606 | 0.1861 | 0.3359 | 0.3376 |
104
+ | 2.0601 | 48.0 | 576 | 2.0301 | 0.3621 | 0.1857 | 0.3346 | 0.3364 |
105
+ | 2.0487 | 49.0 | 588 | 2.0301 | 0.3621 | 0.1857 | 0.3346 | 0.3364 |
106
+ | 2.0538 | 50.0 | 600 | 2.0314 | 0.3617 | 0.1876 | 0.3380 | 0.3399 |
107
+ | 2.071 | 51.0 | 612 | 2.0308 | 0.3608 | 0.1871 | 0.3368 | 0.3385 |
108
+ | 2.0415 | 52.0 | 624 | 2.0283 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
109
+ | 2.007 | 53.0 | 636 | 2.0259 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
110
+ | 2.0238 | 54.0 | 648 | 2.0232 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
111
+ | 2.074 | 55.0 | 660 | 2.0207 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
112
+ | 2.0497 | 56.0 | 672 | 2.0202 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
113
+ | 2.0075 | 57.0 | 684 | 2.0200 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
114
+ | 2.0837 | 58.0 | 696 | 2.0193 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
115
+ | 2.0277 | 59.0 | 708 | 2.0194 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
116
+ | 2.0912 | 60.0 | 720 | 2.0192 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
117
+
118
+
119
+ ### Framework versions
120
+
121
+ - Transformers 4.47.1
122
+ - Pytorch 2.5.1+cu121
123
+ - Datasets 3.2.0
124
+ - Tokenizers 0.21.0
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "decoder_start_token_id": 0,
3
+ "eos_token_id": 1,
4
+ "pad_token_id": 0,
5
+ "transformers_version": "4.47.1"
6
+ }
runs/Jan09_16-37-36_5189e2874f5a/events.out.tfevents.1736441327.5189e2874f5a.629.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:867e6d3e95016a012a06771ad212c291ac1f3756ae8ca1304f8e53eaf407eb32
3
+ size 562