Spaces:
Runtime error
Runtime error
| from tokenizers import models, trainers, Tokenizer | |
| tokenizer = Tokenizer(model=models.WordPiece(unk_token="[UNK]")) | |
| special_tokens = ["[UNK]", "[PAD]", "[CLS]", "[SEP]", "[MASK]"] | |
| trainer = trainers.WordPieceTrainer(vocab_size=25000, special_tokens=special_tokens) | |
| tokenizer.train(["wikitext-2.txt"], trainer=trainer) | |
| encoding = tokenizer.encode("Let's test this tokenizer...", "on a pair of sentences.") | |
| print(encoding.ids) |