HR-T's picture
Upload tokenizer
0bb3faf verified
|
raw
history blame
549 Bytes

CodeParrot

This is a small version of the CodeParrot tokenizer trained on the CodeParrot Python code dataset. The tokenizer is trained in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.