How to use unused token? (UNUSED_0, UNUSED_1, etc.)

by RifqiAnshariR - opened Jan 18

Jan 18

Hi. I have a question about unused token in inobert-base-p1. I want to fine tune the model with adding some "new" special token. Should i assign my new vocab to [UNUSED_X] token? Why is the [UNUSED_X] token turns into multiple sub-tokens when i do:

encoded = tokenizer_p1.encode("[UNUSED_0]")
encoded

Is this actually a reserved unused token in the model vocab?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment