Small German Tokenizer

This is a small public domain-like tokenizer optimized for German.

Special Tokens

End-of-Sequence token: [EOS]
Padding token: [PAD]

Training

This tokenizer was trained on the context column of the configs task1 and task4 in deutsche-telekom/Ger-RAG-eval.

Limitations

Due to its small corpus, this tokenizer may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

qikp
/

small-german-tokenizer

Small German Tokenizer

Special Tokens

Training

Limitations

Dataset used to train qikp/small-german-tokenizer

Space using qikp/small-german-tokenizer 1