floflodebilbao commited on
Commit
29a754e
·
verified ·
1 Parent(s): d5d45ad

End of training

Browse files
README.md CHANGED
@@ -22,21 +22,21 @@ should probably proofread and complete it, then remove this comment. -->
22
 
23
  This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on an unknown dataset.
24
  It achieves the following results on the evaluation set:
25
- - Loss: 4.1201
26
- - Rouge1: 0.2826
27
- - Rouge2: 0.1016
28
- - Rougel: 0.2235
29
- - Rougelsum: 0.2227
30
- - Gen Len: 27.48
31
- - Bleu: 0.0515
32
- - Precisions: 0.1044
33
- - Brevity Penalty: 0.8659
34
- - Length Ratio: 0.8742
35
- - Translation Length: 1056.0
36
  - Reference Length: 1208.0
37
- - Precision: 0.8808
38
- - Recall: 0.8739
39
- - F1: 0.8773
40
  - Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1)
41
 
42
  ## Model description
@@ -56,29 +56,30 @@ More information needed
56
  ### Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
- - learning_rate: 0.001
60
- - train_batch_size: 8
61
- - eval_batch_size: 8
62
  - seed: 42
 
 
63
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
64
  - lr_scheduler_type: linear
65
  - num_epochs: 10
66
- - mixed_precision_training: Native AMP
67
 
68
  ### Training results
69
 
70
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | Bleu | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Precision | Recall | F1 | Hashcode |
71
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:---------:|:------:|:------:|:---------------------------------------------------------:|
72
- | 8.708 | 1.0 | 13 | 6.7992 | 0.2058 | 0.0456 | 0.1594 | 0.159 | 31.68 | 0.0216 | 0.0515 | 1.0 | 1.0737 | 1297.0 | 1208.0 | 0.8535 | 0.8564 | 0.8549 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
73
- | 5.8473 | 2.0 | 26 | 4.8979 | 0.2553 | 0.0817 | 0.1969 | 0.1972 | 27.54 | 0.035 | 0.0853 | 0.8901 | 0.8957 | 1082.0 | 1208.0 | 0.8761 | 0.8691 | 0.8725 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
74
- | 4.6072 | 3.0 | 39 | 4.2460 | 0.269 | 0.0781 | 0.2078 | 0.2084 | 28.32 | 0.0414 | 0.0898 | 0.865 | 0.8733 | 1055.0 | 1208.0 | 0.8742 | 0.8722 | 0.8731 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
75
- | 4.2016 | 4.0 | 52 | 4.1384 | 0.2709 | 0.0894 | 0.2139 | 0.2134 | 27.4 | 0.0495 | 0.0998 | 0.8753 | 0.8825 | 1066.0 | 1208.0 | 0.8792 | 0.8721 | 0.8756 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
76
- | 4.0062 | 5.0 | 65 | 4.0907 | 0.2755 | 0.0825 | 0.2128 | 0.2125 | 28.64 | 0.0437 | 0.0921 | 0.901 | 0.9056 | 1094.0 | 1208.0 | 0.8733 | 0.8725 | 0.8729 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
77
- | 3.892 | 6.0 | 78 | 4.0992 | 0.2806 | 0.0934 | 0.2199 | 0.2191 | 28.22 | 0.0388 | 0.0952 | 0.891 | 0.8965 | 1083.0 | 1208.0 | 0.8797 | 0.8754 | 0.8775 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
78
- | 3.8119 | 7.0 | 91 | 4.0950 | 0.2985 | 0.0916 | 0.2268 | 0.2264 | 28.16 | 0.0284 | 0.0947 | 0.891 | 0.8965 | 1083.0 | 1208.0 | 0.8812 | 0.8763 | 0.8787 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
79
- | 3.7427 | 8.0 | 104 | 4.1031 | 0.2942 | 0.1025 | 0.2356 | 0.2344 | 27.2 | 0.0526 | 0.1111 | 0.8394 | 0.851 | 1028.0 | 1208.0 | 0.8819 | 0.8758 | 0.8788 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
80
- | 3.6902 | 9.0 | 117 | 4.1120 | 0.2981 | 0.1028 | 0.2323 | 0.232 | 28.08 | 0.0487 | 0.1036 | 0.8836 | 0.8899 | 1075.0 | 1208.0 | 0.8782 | 0.8755 | 0.8768 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
81
- | 3.6548 | 10.0 | 130 | 4.1201 | 0.2826 | 0.1016 | 0.2235 | 0.2227 | 27.48 | 0.0515 | 0.1044 | 0.8659 | 0.8742 | 1056.0 | 1208.0 | 0.8808 | 0.8739 | 0.8773 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
82
 
83
 
84
  ### Framework versions
 
22
 
23
  This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on an unknown dataset.
24
  It achieves the following results on the evaluation set:
25
+ - Loss: 4.1010
26
+ - Rouge1: 0.3015
27
+ - Rouge2: 0.0982
28
+ - Rougel: 0.2325
29
+ - Rougelsum: 0.234
30
+ - Gen Len: 27.86
31
+ - Bleu: 0.0493
32
+ - Precisions: 0.1077
33
+ - Brevity Penalty: 0.8669
34
+ - Length Ratio: 0.875
35
+ - Translation Length: 1057.0
36
  - Reference Length: 1208.0
37
+ - Precision: 0.8803
38
+ - Recall: 0.8763
39
+ - F1: 0.8783
40
  - Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1)
41
 
42
  ## Model description
 
56
  ### Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
+ - learning_rate: 0.002
60
+ - train_batch_size: 1
61
+ - eval_batch_size: 1
62
  - seed: 42
63
+ - gradient_accumulation_steps: 16
64
+ - total_train_batch_size: 16
65
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
66
  - lr_scheduler_type: linear
67
  - num_epochs: 10
 
68
 
69
  ### Training results
70
 
71
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | Bleu | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Precision | Recall | F1 | Hashcode |
72
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:---------:|:------:|:------:|:---------------------------------------------------------:|
73
+ | 8.1516 | 1.0 | 7 | 7.5736 | 0.2121 | 0.0491 | 0.1581 | 0.1587 | 32.0 | 0.0174 | 0.0574 | 1.0 | 1.0728 | 1296.0 | 1208.0 | 0.8534 | 0.8583 | 0.8557 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
74
+ | 5.8141 | 2.0 | 14 | 5.0888 | 0.2526 | 0.0765 | 0.1991 | 0.2004 | 26.88 | 0.0316 | 0.0882 | 0.822 | 0.8361 | 1010.0 | 1208.0 | 0.8772 | 0.8715 | 0.8742 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
75
+ | 4.3777 | 3.0 | 21 | 4.4191 | 0.2668 | 0.0907 | 0.2057 | 0.2072 | 24.04 | 0.0421 | 0.1088 | 0.7134 | 0.7475 | 903.0 | 1208.0 | 0.8824 | 0.8719 | 0.877 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
76
+ | 3.9067 | 4.0 | 28 | 4.2179 | 0.2684 | 0.0813 | 0.2084 | 0.2085 | 25.14 | 0.0378 | 0.1006 | 0.7488 | 0.7757 | 937.0 | 1208.0 | 0.8799 | 0.8705 | 0.8751 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
77
+ | 3.6847 | 5.0 | 35 | 4.1231 | 0.2897 | 0.0861 | 0.2227 | 0.2226 | 29.34 | 0.0362 | 0.0876 | 0.9412 | 0.9429 | 1139.0 | 1208.0 | 0.8751 | 0.8761 | 0.8756 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
78
+ | 3.5317 | 6.0 | 42 | 4.1113 | 0.2644 | 0.0858 | 0.2097 | 0.2107 | 26.66 | 0.0395 | 0.0983 | 0.8063 | 0.8228 | 994.0 | 1208.0 | 0.8826 | 0.8744 | 0.8784 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
79
+ | 3.4303 | 7.0 | 49 | 4.0934 | 0.2866 | 0.0945 | 0.2219 | 0.2226 | 27.02 | 0.0407 | 0.1017 | 0.8413 | 0.8526 | 1030.0 | 1208.0 | 0.8827 | 0.8773 | 0.8799 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
80
+ | 3.3587 | 8.0 | 56 | 4.0800 | 0.2956 | 0.1007 | 0.2287 | 0.2302 | 28.1 | 0.0467 | 0.1031 | 0.8734 | 0.8808 | 1064.0 | 1208.0 | 0.8805 | 0.8756 | 0.878 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
81
+ | 3.3033 | 9.0 | 63 | 4.0926 | 0.2924 | 0.0982 | 0.2205 | 0.2225 | 27.06 | 0.0481 | 0.1062 | 0.8461 | 0.8568 | 1035.0 | 1208.0 | 0.8813 | 0.8747 | 0.8779 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
82
+ | 3.2828 | 10.0 | 70 | 4.1010 | 0.3015 | 0.0982 | 0.2325 | 0.234 | 27.86 | 0.0493 | 0.1077 | 0.8669 | 0.875 | 1057.0 | 1208.0 | 0.8803 | 0.8763 | 0.8783 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
83
 
84
 
85
  ### Framework versions
adapter_config.json CHANGED
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "v_proj",
28
- "out_proj",
29
  "k_proj",
30
- "q_proj"
 
 
31
  ],
32
  "task_type": "SEQ_2_SEQ_LM",
33
  "trainable_token_indices": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
 
 
27
  "k_proj",
28
+ "out_proj",
29
+ "q_proj",
30
+ "v_proj"
31
  ],
32
  "task_type": "SEQ_2_SEQ_LM",
33
  "trainable_token_indices": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:534465ae1ca27c2a522d25577df098e585384cd46e787988aafc9c6fba8a771d
3
  size 2372496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96e8ce08a1e0b31c2ff468249515d2dfd8e19d90d01e9c5e82b5c4a1c2933d99
3
  size 2372496
runs/Jul29_13-02-27_tardis/events.out.tfevents.1753786948.tardis.19354.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abce1d136e3e232a6ab62ff027a84c5a5150696f6b76947b9a33c502238553df
3
+ size 19374
tokenizer.json CHANGED
@@ -1,7 +1,21 @@
1
  {
2
  "version": "1.0",
3
- "truncation": null,
4
- "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 0,
 
1
  {
2
  "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 64,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": {
11
+ "Fixed": 64
12
+ },
13
+ "direction": "Right",
14
+ "pad_to_multiple_of": null,
15
+ "pad_id": 1,
16
+ "pad_type_id": 0,
17
+ "pad_token": "<pad>"
18
+ },
19
  "added_tokens": [
20
  {
21
  "id": 0,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:993a07156b4546a639efcb5894ce14068d90da7fce3e8d6da188663ca08c17d4
3
  size 5905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9a7bcb03026319cb27e62b6a941aa99fbfe4af3a1e2071641995ca5d6ea421d
3
  size 5905