Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer. Can be used to train VLMs.