TensorBoard
Safetensors
English
German
t5
File size: 1,064 Bytes
2841d7e
 
 
 
 
70c283e
2841d7e
 
 
 
 
 
 
73568bc
 
4f348ae
73568bc
8f19a39
4f348ae
73568bc
 
1b8074b
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: apache-2.0
datasets:
- occiglot/occiglot-fineweb-v1.0
- HuggingFaceFW/fineweb
- HuggingFaceFW/fineweb-edu
language:
- en
- de
---

# Occiglot5

![Occiglot5](occiglot5_logo.png)

Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features:

* Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu
* [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective
* Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used
* Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
* One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes

# Acknowledgments

Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs over many years ❤️

Made from Bavarian Oberland with ❤️ and 🥨.