Transformers
Safetensors
Turkish
mbart
text2text-generation
meliksahturker commited on
Commit
020ea4e
·
verified ·
1 Parent(s): 18f3d78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -31,12 +31,11 @@ VBART-XLarge improves the results compared to VBART-Large albeit in small margin
31
 
32
  ### Pre-training Data
33
  The base model is pre-trained on [vngrs-web-corpus](https://huggingface.co/datasets/vngrs-ai/vngrs-web-corpus). It is curated by cleaning and filtering Turkish parts of [OSCAR-2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201) and [mC4](https://huggingface.co/datasets/mc4) datasets. These datasets consist of documents of unstructured web crawl data. More information about the dataset can be found on their respective pages. Data is filtered using a set of heuristics and certain rules, explained in the appendix of our [paper](https://arxiv.org/abs/2403.01308).
34
- #### Hardware
35
- - **GPUs**: 8 x Nvidia A100-80 GB
36
  #### Software
37
  - TensorFlow
38
  #### Pre-training Setting
39
- - **Duration**: Pre-trained for 30 days.
 
40
  - **Training tokens**: 84B
41
  - **Context Length**: 1024 for both encoder and decoder
42
  - **Training regime:** fp16 mixed precision
 
31
 
32
  ### Pre-training Data
33
  The base model is pre-trained on [vngrs-web-corpus](https://huggingface.co/datasets/vngrs-ai/vngrs-web-corpus). It is curated by cleaning and filtering Turkish parts of [OSCAR-2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201) and [mC4](https://huggingface.co/datasets/mc4) datasets. These datasets consist of documents of unstructured web crawl data. More information about the dataset can be found on their respective pages. Data is filtered using a set of heuristics and certain rules, explained in the appendix of our [paper](https://arxiv.org/abs/2403.01308).
 
 
34
  #### Software
35
  - TensorFlow
36
  #### Pre-training Setting
37
+ - **Duration**: Pre-trained for 8 days.
38
+ - **GPUs**: 8 x Nvidia A100-80 GB
39
  - **Training tokens**: 84B
40
  - **Context Length**: 1024 for both encoder and decoder
41
  - **Training regime:** fp16 mixed precision