Tokenizer cannot recognise unused tokens in its vocab

#13

by ArvinZhuang - opened Nov 14, 2025

Nov 14, 2025

Sometimes we can leverage unused tokens for some tasks.
But this can be fixed by self.tokenizer.add_special_tokens({"additional_special_tokens": ["[unused39]"]})

ArvinZhuang changed discussion title from Tokenizer cannot recognise unused tokens. to Tokenizer cannot recognise unused tokens in its vocab Nov 14, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment