Update README.md
Browse files
README.md
CHANGED
|
@@ -62,7 +62,6 @@ language:
|
|
| 62 |
- et
|
| 63 |
- fi
|
| 64 |
- hu
|
| 65 |
-
|
| 66 |
pipeline_tag: text-generation
|
| 67 |
tags:
|
| 68 |
- multilingual
|
|
@@ -75,7 +74,7 @@ tags:
|
|
| 75 |
datasets:
|
| 76 |
- mc4
|
| 77 |
- wikipedia
|
| 78 |
-
thumbnail:
|
| 79 |
---
|
| 80 |
|
| 81 |
# Multilingual GPT model
|
|
@@ -140,4 +139,4 @@ Languages:
|
|
| 140 |
## Details
|
| 141 |
The model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 61 languages. The model has seen 440 billion BPE tokens in total.
|
| 142 |
|
| 143 |
-
Total training time was around 14 days on 256 Nvidia V100 GPUs.
|
|
|
|
| 62 |
- et
|
| 63 |
- fi
|
| 64 |
- hu
|
|
|
|
| 65 |
pipeline_tag: text-generation
|
| 66 |
tags:
|
| 67 |
- multilingual
|
|
|
|
| 74 |
datasets:
|
| 75 |
- mc4
|
| 76 |
- wikipedia
|
| 77 |
+
thumbnail: https://github.com/sberbank-ai/mgpt
|
| 78 |
---
|
| 79 |
|
| 80 |
# Multilingual GPT model
|
|
|
|
| 139 |
## Details
|
| 140 |
The model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 61 languages. The model has seen 440 billion BPE tokens in total.
|
| 141 |
|
| 142 |
+
Total training time was around 14 days on 256 Nvidia V100 GPUs.
|