Special pad token

#6
by dk2502 - opened

Hi!

The special pad token is 32000. But the model vocab size is 32000. Doesn't this make the tokenizer vocab larger than the model vocab? By that, I mean the token 32000 can't be mapped to a vector by the embedding layer?

So if you print
len(tokenizer.get_vocab())
will give you 32001
But model.model.embed_tokens will give you
Embedding(32000, 4096, padding_idx=0)

Sign up or log in to comment