vngrs-ai
/

VBART-XLarge-Base

text2text-generation

Model card Files Files and versions

meliksahturker commited on about 1 month ago

Commit

18f3d78

·

verified ·

1 Parent(s): e1d32d1

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ The base model is pre-trained on [vngrs-web-corpus](https://huggingface.co/datas
 - **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution λ=3.5, masking 30% of tokens)
 - **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
 - **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
-- - **Weight Initialization**: Model Enlargement from VBART-Large. See the related section in the [paper](https://arxiv.org/abs/2403.01308) for the details.
 - **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
 - **Initial Learning rate**: 5e-6

 - **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution λ=3.5, masking 30% of tokens)
 - **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
 - **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
+- **Weight Initialization**: Model Enlargement from VBART-Large. See the related section in the [paper](https://arxiv.org/abs/2403.01308) for the details.
 - **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
 - **Initial Learning rate**: 5e-6