Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ The base model is pre-trained on [vngrs-web-corpus](https://huggingface.co/datas
|
|
| 43 |
- **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution 位=3.5, masking 30% of tokens)
|
| 44 |
- **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
|
| 45 |
- **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
|
| 46 |
-
-
|
| 47 |
- **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
|
| 48 |
- **Initial Learning rate**: 5e-6
|
| 49 |
|
|
|
|
| 43 |
- **Training objective**: Sentence permutation and span masking (using mask lengths sampled from Poisson distribution 位=3.5, masking 30% of tokens)
|
| 44 |
- **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
|
| 45 |
- **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
|
| 46 |
+
- **Weight Initialization**: Model Enlargement from VBART-Large. See the related section in the [paper](https://arxiv.org/abs/2403.01308) for the details.
|
| 47 |
- **Dropout**: 0.1 (dropped to 0.05 and then to 0 in the last 80K and 80k steps, respectively)
|
| 48 |
- **Initial Learning rate**: 5e-6
|
| 49 |
|