ai-forever
/

mGPT

Text Generation

text-generation-inference

Model card Files Files and versions

ai-forever commited on Apr 18, 2022

Commit

14ac632

·

1 Parent(s): 80cd3eb

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -102,13 +102,14 @@ Model includes 60 languages: (iso codes)
 ## Training Data Statistics
- - Tokens: 559B
 <img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
 "General training corpus statistics"
 ## Details
-Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.
-Total training time was around n days on n GPUs for n context and few days on n GPUs for n context.

 ## Training Data Statistics
+ - Tokens: 488 Billion BBPE tokens
 <img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
 "General training corpus statistics"
 ## Details
+Model was trained with sequence length 512 using Megatron and Deepspeed libs by [SberDevices](https://sberdevices.ru/) team on a dataset of 600 GB of texts in 60 languages. The model has seen 440 billion BPE tokens in total.
+Total training time was around 12 days on 256 Nvidia V100 GPUs.