Instructions to use microsoft/trocr-small-handwritten with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/trocr-small-handwritten with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="microsoft/trocr-small-handwritten")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("microsoft/trocr-small-handwritten") model = AutoModelForImageTextToText.from_pretrained("microsoft/trocr-small-handwritten") - Notebooks
- Google Colab
- Kaggle
This model does not work
Running the code provided in the readme throws the following error:
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
It is possible to run the model when using the processor from the repos from the larger models:
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-small-handwritten')
works
However, I am not sure whether there is a difference between the processors of different model sizes.
Hi Chrisxx,
I had the same problem and fixed it with "pip install sentencepiece"
Found this solution from "https://stackoverflow.com/questions/65431837/transformers-v4-x-convert-slow-tokenizer-to-fast-tokenizer"
Works!