| license: apache-2.0 | |
| datasets: | |
| - occiglot/occiglot-fineweb-v1.0 | |
| - HuggingFaceFW/fineweb | |
| - HuggingFaceFW/fineweb-edu | |
| language: | |
| - en | |
| - de | |
| # Occiglot5 | |
|  | |
| Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features: | |
| * Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu | |
| * [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective | |
| * Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used | |
| * Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512 | |
| * One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes | |
| # Acknowledgments | |
| Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC). | |
| Many Thanks for providing access to the TPUs over many years ❤️ | |
| Made from Bavarian Oberland with ❤️ and 🥨. |