CarolGuga commited on
Commit
045351d
·
verified ·
1 Parent(s): b0673d1

Training complete

Browse files
Files changed (2) hide show
  1. README.md +8 -12
  2. generation_config.json +1 -3
README.md CHANGED
@@ -19,9 +19,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 0.2519
23
- - Bleu: 87.0261
24
- - Gen Len: 18.4048
25
 
26
  ## Model description
27
 
@@ -41,28 +41,24 @@ More information needed
41
 
42
  The following hyperparameters were used during training:
43
  - learning_rate: 5.6e-05
44
- - train_batch_size: 1
45
- - eval_batch_size: 1
46
  - seed: 42
47
- - gradient_accumulation_steps: 8
48
- - total_train_batch_size: 8
49
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
  - lr_scheduler_type: linear
51
- - lr_scheduler_warmup_ratio: 0.1
52
  - num_epochs: 2
53
- - mixed_precision_training: Native AMP
54
 
55
  ### Training results
56
 
57
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
58
  |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|
59
- | No log | 1.0 | 472 | 0.2977 | 85.4471 | 18.2429 |
60
- | 0.9756 | 2.0 | 944 | 0.2519 | 87.0261 | 18.4048 |
61
 
62
 
63
  ### Framework versions
64
 
65
  - Transformers 4.51.2
66
  - Pytorch 2.10.0+cu128
67
- - Datasets 4.0.0
68
  - Tokenizers 0.21.4
 
19
 
20
  This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.0138
23
+ - Bleu: 98.4772
24
+ - Gen Len: 18.5104
25
 
26
  ## Model description
27
 
 
41
 
42
  The following hyperparameters were used during training:
43
  - learning_rate: 5.6e-05
44
+ - train_batch_size: 8
45
+ - eval_batch_size: 8
46
  - seed: 42
 
 
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: linear
 
49
  - num_epochs: 2
 
50
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
54
  |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|
55
+ | No log | 1.0 | 440 | 0.0246 | 98.2861 | 18.5729 |
56
+ | 0.2226 | 2.0 | 880 | 0.0138 | 98.4772 | 18.5104 |
57
 
58
 
59
  ### Framework versions
60
 
61
  - Transformers 4.51.2
62
  - Pytorch 2.10.0+cu128
63
+ - Datasets 4.6.0
64
  - Tokenizers 0.21.4
generation_config.json CHANGED
@@ -3,11 +3,9 @@
3
  "decoder_start_token_id": 2,
4
  "early_stopping": true,
5
  "eos_token_id": 2,
6
- "forced_bos_token_id": 250005,
7
  "forced_eos_token_id": 2,
8
  "max_length": 200,
9
  "num_beams": 5,
10
  "pad_token_id": 1,
11
- "transformers_version": "4.51.2",
12
- "use_cache": false
13
  }
 
3
  "decoder_start_token_id": 2,
4
  "early_stopping": true,
5
  "eos_token_id": 2,
 
6
  "forced_eos_token_id": 2,
7
  "max_length": 200,
8
  "num_beams": 5,
9
  "pad_token_id": 1,
10
+ "transformers_version": "4.51.2"
 
11
  }