How to use unused token? (UNUSED_0, UNUSED_1, etc.)

#6
by RifqiAnshariR - opened

Hi. I have a question about unused token in inobert-base-p1. I want to fine tune the model with adding some "new" special token. Should i assign my new vocab to [UNUSED_X] token? Why is the [UNUSED_X] token turns into multiple sub-tokens when i do:

encoded = tokenizer_p1.encode("[UNUSED_0]")
encoded

Is this actually a reserved unused token in the model vocab?

Sign up or log in to comment