TensorBoard
Safetensors
English
German
t5
occiglot5 / README.md
stefan-it's picture
readme: add more clarifications about German FineWeb dataset, used for pretraining
8f19a39 verified
metadata
license: apache-2.0
datasets:
  - occiglot/occiglot-fineweb-v1.0
  - HuggingFaceFW/fineweb
  - HuggingFaceFW/fineweb-edu
language:
  - en
  - de

Occiglot5

Occiglot5

Occiglot5 is a modern T5 model for German with 1.42B parameters and the following features:

  • Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu
  • UL2 is used as pretraining objective
  • Efficient T5 architecture from the "Scale Efficiently" paper is used
  • Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
  • One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes

Acknowledgments

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs over many years ❤️

Made from Bavarian Oberland with ❤️ and 🥨.