This model does not work

by chrisxx - opened Apr 9, 2023

Apr 9, 2023

Running the code provided in the readme throws the following error:

ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

chrisxx

Apr 9, 2023

It is possible to run the model when using the processor from the repos from the larger models:

processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-small-handwritten')

works

However, I am not sure whether there is a difference between the processors of different model sizes.

TechMeSomething

Apr 20, 2023

Hi Chrisxx,

I had the same problem and fixed it with "pip install sentencepiece"

Found this solution from "https://stackoverflow.com/questions/65431837/transformers-v4-x-convert-slow-tokenizer-to-fast-tokenizer"

chrisxx

Apr 21, 2023

Works!

chrisxx changed discussion status to closed Apr 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment