Cannot reproduce results on InfographicsVQA

by zhuowan - opened May 30, 2023

May 30, 2023

•

edited May 30, 2023

I am using the pix2struct-infographics-vqa-base and pix2struct-infographics-vqa-large model here and doing inference on InfographicsVQA. However, I get 29.53 ANLS for base and 34.31 ANLS for large, which do not match with the 38.2 and 40.0 results as in the original paper. Could anyone help with this?

Here is my inference code:

import requests
from PIL import Image
import torch
from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor

model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-infographics-vqa-base").to("cuda")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-infographics-vqa-base")

image_url = "https://blogs.constantcontact.com/wp-content/uploads/2019/03/Social-Media-Infographic.png"
image = Image.open(requests.get(image_url, stream=True).raw)
question = "Which social platform has heavy female audience?"
inputs = processor(images=image, text=question, return_tensors="pt").to("cuda")

predictions = model.generate(**inputs)
pred = processor.decode(predictions[0], skip_special_tokens=True)
gt = 'pinterest'

print(pred)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment