Transformers
Safetensors
Turkish
mbart
text2text-generation
meliksahturker commited on
Commit
18f3d78
verified
1 Parent(s): e1d32d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ The base model is pre-trained on [vngrs-web-corpus](https://huggingface.co/datas
43
  - **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution 位=3.5, masking 30% of tokens)
44
  - **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
45
  - **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
46
- - - **Weight Initialization**: Model Enlargement from VBART-Large. See the related section in the [paper](https://arxiv.org/abs/2403.01308) for the details.
47
  - **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
48
  - **Initial Learning rate**: 5e-6
49
 
 
43
  - **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution 位=3.5, masking 30% of tokens)
44
  - **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
45
  - **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
46
+ - **Weight Initialization**: Model Enlargement from VBART-Large. See the related section in the [paper](https://arxiv.org/abs/2403.01308) for the details.
47
  - **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
48
  - **Initial Learning rate**: 5e-6
49