Tokenizer fix
#1
by justinbarton - opened
No description provided.
Presently loading the tokenizer via:
tokeniser = T5Tokenizer.from_pretrained("Exscientia/IgT5", do_lower_case=False)
Yields the following error:
ValueError: Non-consecutive added token '<extra_id_99>' found. Should have index 128 but has index 28 in saved vocabulary.
This PR should resolve the issue.
justinbarton changed pull request status to open
Hi @justinbarton , thank you for the interest in our work! What version of transformers are you using? I tried this line in a colab notebook with both the transformers version we developed in (4.35.2) as well as the latest version (4.39.3) and they both imported the tokeniser without any errors.
How odd. I was using 4.30.2.
exs-fdreyer changed pull request status to closed