ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Paper • 2102.03334 • Published
How to use hf-internal-testing/tiny-vilt-random-vqa with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("visual-question-answering", model="hf-internal-testing/tiny-vilt-random-vqa") # Load model directly
from transformers import AutoProcessor, AutoModelForVisualQuestionAnswering
processor = AutoProcessor.from_pretrained("hf-internal-testing/tiny-vilt-random-vqa")
model = AutoModelForVisualQuestionAnswering.from_pretrained("hf-internal-testing/tiny-vilt-random-vqa")A tiny randomly-initialized ViLT used for unit tests in the Transformers VQA pipeline