Spaces:
Runtime error
This demo loads the
FlaxCLIPVisionBertForSequenceClassificationModelpresent in themodeldirectory of this repository. The checkpoint is loaded fromckpt/ckpt-60k-5999which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random examples are present in thedummy_vqa_multilingual.tsvwhich respective images in theimages/val2014directory.You can also upload your image using the
Upload your imagefile uplaoder and type in a question of your choosing.We provide
English Translationof the question for users who are not acquainted with the other languages. This is done usingmtranslateto keep things flexible enough and needs internet connection as it uses the Google Translate API.The model predicts the answers from a list of 3129 answers which have their labels present in
answer_reverse_mapping.json.Lastly, once can choose the
Answer Languagewhich is also a saved dictionary created usingmtranslatelibrary for the 3129 answer options.The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.