Commit
·
14ac632
1
Parent(s):
80cd3eb
Update README.md
Browse files
README.md
CHANGED
|
@@ -102,13 +102,14 @@ Model includes 60 languages: (iso codes)
|
|
| 102 |
|
| 103 |
## Training Data Statistics
|
| 104 |
|
| 105 |
-
- Tokens:
|
|
|
|
| 106 |
|
| 107 |
<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
|
| 108 |
"General training corpus statistics"
|
| 109 |
|
| 110 |
|
| 111 |
## Details
|
| 112 |
-
Model was trained with sequence length
|
| 113 |
|
| 114 |
-
Total training time was around
|
|
|
|
| 102 |
|
| 103 |
## Training Data Statistics
|
| 104 |
|
| 105 |
+
- Tokens: 488 Billion BBPE tokens
|
| 106 |
+
|
| 107 |
|
| 108 |
<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
|
| 109 |
"General training corpus statistics"
|
| 110 |
|
| 111 |
|
| 112 |
## Details
|
| 113 |
+
Model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 60 languages. The model has seen 440 billion BPE tokens in total.
|
| 114 |
|
| 115 |
+
Total training time was around 12 days on 256 Nvidia V100 GPUs.
|