Small German Tokenizer
This is a small public domain-like tokenizer optimized for German.
Special Tokens
- End-of-Sequence token:
[EOS] - Padding token:
[PAD]
Training
This tokenizer was trained on the context column of the configs task1 and task4 in deutsche-telekom/Ger-RAG-eval.
Limitations
Due to its small corpus, this tokenizer may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support