File size: 3,180 Bytes
3039cb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
license: apache-2.0
base_model: google/mt5-base
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: mt5-bleu-durga-q1-clean
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mt5-bleu-durga-q1-clean

This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7819
- Bleu: 0.0359

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 20
- eval_batch_size: 20
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 30

### Training results

| Training Loss | Epoch | Step | Validation Loss | Bleu   |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 15.8442       | 1.0   | 3    | 11.1246         | 0.0    |
| 13.0661       | 2.0   | 6    | 9.3553          | 0.0    |
| 11.7048       | 3.0   | 9    | 8.0317          | 0.0    |
| 8.87          | 4.0   | 12   | 7.1382          | 0.0    |
| 11.0893       | 5.0   | 15   | 6.7905          | 0.0    |
| 9.8787        | 6.0   | 18   | 6.5255          | 0.0    |
| 9.8189        | 7.0   | 21   | 6.7007          | 0.0    |
| 8.2022        | 8.0   | 24   | 6.2109          | 0.0    |
| 8.5899        | 9.0   | 27   | 5.9520          | 0.0    |
| 7.5305        | 10.0  | 30   | 5.5748          | 0.0    |
| 7.0381        | 11.0  | 33   | 5.2219          | 0.0054 |
| 6.675         | 12.0  | 36   | 4.8006          | 0.0046 |
| 7.4134        | 13.0  | 39   | 4.3795          | 0.0051 |
| 5.8722        | 14.0  | 42   | 3.9322          | 0.0099 |
| 4.5875        | 15.0  | 45   | 3.5017          | 0.0079 |
| 5.3675        | 16.0  | 48   | 3.1927          | 0.0    |
| 4.2999        | 17.0  | 51   | 2.8956          | 0.0110 |
| 4.3349        | 18.0  | 54   | 2.7138          | 0.0088 |
| 3.9688        | 19.0  | 57   | 2.5350          | 0.0    |
| 4.2931        | 20.0  | 60   | 2.4138          | 0.0    |
| 3.8427        | 21.0  | 63   | 2.3127          | 0.0    |
| 3.2991        | 22.0  | 66   | 2.2054          | 0.0    |
| 3.1351        | 23.0  | 69   | 2.1069          | 0.0    |
| 3.023         | 24.0  | 72   | 2.0208          | 0.0    |
| 3.4366        | 25.0  | 75   | 1.9500          | 0.0272 |
| 2.7941        | 26.0  | 78   | 1.9068          | 0.0370 |
| 2.9454        | 27.0  | 81   | 1.8419          | 0.0365 |
| 2.6117        | 28.0  | 84   | 1.8775          | 0.0367 |
| 2.6785        | 29.0  | 87   | 1.7772          | 0.0361 |
| 2.7523        | 30.0  | 90   | 1.7819          | 0.0359 |


### Framework versions

- Transformers 4.46.1
- Pytorch 2.5.0+cu121
- Datasets 3.0.2
- Tokenizers 0.20.1