--- license: apache-2.0 datasets: - occiglot/occiglot-fineweb-v1.0 - HuggingFaceFW/fineweb - HuggingFaceFW/fineweb-edu language: - en - de --- # Occiglot5 ![Occiglot5](occiglot5_logo.png) Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features: * Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu * [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective * Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used * Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512 * One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes # Acknowledgments Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC). Many Thanks for providing access to the TPUs over many years ❤️ Made from Bavarian Oberland with ❤️ and 🥨.