| # t5-small-wikitext | |
| t5-small trained on [wikitext/wikitest-103-raw-v1](wikitext/wikitest-103-raw-v1) over 50k steps (around 2 hours of training) following [T5 paper](https://arxiv.org/pdf/1910.10683.pdf) training procedure. | |
| * batch_size: 32 | |
| * max_seq_length: 128 | |
| * optim: Adafactor | |
| * sheduler: inverse square root (10k warm-up steps) | |
| --- | |
| language: | |
| - "List of ISO 639-1 code for your language" | |
| - lang1 | |
| - lang2 | |
| thumbnail: "url to a thumbnail used in social sharing" | |
| tags: | |
| - tag1 | |
| - tag2 | |
| license: "any valid license identifier" | |
| datasets: | |
| - dataset1 | |
| - dataset2 | |
| metrics: | |
| - metric1 | |
| - metric2 | |
| --- |