Trofish commited on
Commit
fc2e618
·
verified ·
1 Parent(s): 6a87b40

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ # RoViQA<br>(Open-ended) Visual-Question Answering Model
4
+
5
+ ![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=PyTorch&logoColor=white)
6
+ ![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)
7
+ ![Visual Studio Code](https://img.shields.io/badge/Visual%20Studio%20Code-0078d7.svg?style=for-the-badge&logo=visual-studio-code&logoColor=white)
8
+ <br><b> RoViQA: Visual-Question-Answering model created by combining Roberta and ViT</b> <br><br>
9
+ This repository contains the code for **RoViQA**, a Visual Question Answering (VQA) model that combines image features extracted using Vision Transformer (ViT) and text features extracted using RoBERTa. The project includes training, inference, and various utility scripts.
10
+ </div>
11
+
12
+ ## Model Architecture
13
+ <p align="center">
14
+ <img src="https://github.com/Tro-fish/Visual-Question-Answering/assets/79634774/f7d0eb20-f3b4-4f69-880d-d412ed32ab68" alt="Description of the image" width="100%" />
15
+ </p>
16
+
17
+ ## RoViQA Overview
18
+ RoViQA is a Visual Question Answering (VQA) model that leverages the power of Vision Transformer (ViT) and RoBERTa to understand and answer questions about images. By combining the strengths of these two models, RoViQA can effectively process and interpret both visual and textual information to provide accurate answers.
19
+
20
+ ## Model parameter
21
+ - Base Models
22
+ - Roberta-base: 110M parameters
23
+ - ViT-base: 86M parameters
24
+ - **RoViQA**: 215M parameters