Create README.md
#2
by
AEnigmista
- opened
README.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- visual_bert
|
| 7 |
+
- vqa
|
| 8 |
+
- easy_vqa
|
| 9 |
+
---
|
| 10 |
+
# Visual BERT finetuned on easy_vqa
|
| 11 |
+
This model is a finetuned version of the VisualBERT model on the easy_vqa dataset. The dataset is available at the following [github repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
|
| 12 |
+
|
| 13 |
+
## VisualBERT
|
| 14 |
+
VisualBERT is a multi-modal vision and language model. It can be used for tasks such as visual question answering, multiple choice and visual reasoning.
|
| 15 |
+
For more info on VisualBERT, please refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/visual_bert#overview)
|
| 16 |
+
|
| 17 |
+
## Dataset
|
| 18 |
+
The dataset easy_vqa, with which the model was fine-tuned, can be easily installed via the package easy_vqa:
|
| 19 |
+
```python
|
| 20 |
+
pip install easy_vqa
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
An instance of the dataset is composed of a question, the answer of the question (a label) and the id of the image related to the question.
|
| 24 |
+
Each image is 64x64 and contains a shape (rectangle, triangle or circle) filled with a single color (blue, red, green, yellow, black, gray, brown or teal)
|
| 25 |
+
in a random position.
|
| 26 |
+
|
| 27 |
+
The questions of the dataset inquire about the shape (e.g. What is the blue shape?), the color of the shape (e.g. What color is the triangle?)
|
| 28 |
+
and the presence of a particular shape/color in both affermative and negative form (e.g. Is there a red shape?).
|
| 29 |
+
Therefore, the possible answers to a question are: the three possible shapes, the eight possible colors, yes and no.
|
| 30 |
+
|
| 31 |
+
More information about the package functions which allow to load the images and the questions can be found in the dataset's [repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
|
| 32 |
+
as well an utility script to generate new instances of the dataset in case Data Augmentation is needed.
|
| 33 |
+
|
| 34 |
+
## How to Use
|
| 35 |
+
Load the image processor and the model with the following code:
|
| 36 |
+
```python
|
| 37 |
+
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
|
| 38 |
+
|
| 39 |
+
model = VisualBertForQuestionAnswering.from_pretrained("daki97/visualbert_finetuned_easy_vqa")
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
## COLAB Demo
|
| 43 |
+
An example of the usage of the model with the easy_vqa dataset is available [here](https://colab.research.google.com/drive/1yQfmz6wiSasRl6z-DmP-X403r3lZFqQS#scrollTo=HeVnH8BKkYCI)
|