TensorBoard
Safetensors
English
German
t5
occiglot5 / README.md
stefan-it's picture
readme: add more clarifications about German FineWeb dataset, used for pretraining
8f19a39 verified
---
license: apache-2.0
datasets:
- occiglot/occiglot-fineweb-v1.0
- HuggingFaceFW/fineweb
- HuggingFaceFW/fineweb-edu
language:
- en
- de
---
# Occiglot5
![Occiglot5](occiglot5_logo.png)
Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features:
* Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu
* [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective
* Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used
* Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
* One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes
# Acknowledgments
Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs over many years ❤️
Made from Bavarian Oberland with ❤️ and 🥨.