Spaces:
Build error
Build error
| title: Visual QNA | |
| emoji: π | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.43.2 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Streamlit app for Visual QA using VILT model to answer image | |
| # Image-Based Question Answering System | |
| ## Overview | |
| This repository contains two projects: | |
| 1. **Complete Web Application** β A full-stack web app built using Streamlit for both frontend and backend. | |
| 2. **Flask API Backend** β A standalone Flask-based backend API. | |
| Both implementations allow users to upload an image and ask questions about it. The system uses the **dandelin/vilt-b32-finetuned-vqa** model to analyze and respond to queries based on the provided image. | |
| ## Features | |
| - Users can upload an image. | |
| - Users can ask questions related to the uploaded image. | |
| - The model processes the image and answers questions based on its content. | |
| - Two implementations: | |
| - **Streamlit Web App:** A complete frontend and backend application. | |
| - **Flask API:** A RESTful API for backend processing. | |
| ## Technology Stack | |
| - **Frontend:** Streamlit (for the web app UI) | |
| - **Backend:** Flask (for the API) | |
| - **Model:** `dandelin/vilt-b32-finetuned-vqa` | |
| - **Libraries:** PyTorch, Transformers, Pillow, OpenCV, Requests | |
| --- | |
| ## Live Demo | |
| You can test the application live at: | |
| [Visual QNA with image](https://huggingface.co/spaces/Tahir5/Visual-QNA) | |
| ## Installation & Setup | |
| ### 1. Clone the Repository | |
| ```bash | |
| git clone https://github.com/your-repo/image-vqa.git | |
| cd image-vqa | |
| ``` | |
| ### 2. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 3. Run the Streamlit Web App | |
| ```bash | |
| streamlit run stream.py | |
| ``` | |
| ### 4. Run the Flask API | |
| ```bash | |
| python flask_app.py | |
| ``` | |
| --- | |
| ## API Endpoints (For Flask Backend) | |
| ### 1. Visual Question Answering (VQA) | |
| **Endpoint:** `POST /vqa` | |
| - **Description:** Processes an image and a question, returning an answer. | |
| - **Request Format:** Multipart form-data | |
| - `image`: The uploaded image file. | |
| - `question`: The question related to the image. | |
| - **Response Format:** JSON | |
| **Example Request (cURL):** | |
| ```bash | |
| curl -X POST "http://127.0.0.1:5000/vqa" \ | |
| -F "image=@path/to/image.jpg" \ | |
| -F "question=What is in the image?" | |
| ``` | |
| **Example Response:** | |
| ```json | |
| { | |
| "question": "What is in the image?", | |
| "answer": "A cat sitting on a table." | |
| } | |
| ``` | |
| --- | |
| ## Testing with Postman | |
| ### Steps to Test the Flask API in Postman | |
| 1. Open **Postman**. | |
| 2. Select **POST** request. | |
| 3. Enter the request URL: `http://127.0.0.1:5000/vqa`. | |
| 4. Navigate to the **Body** tab and select **form-data**. | |
| 5. Add two key-value pairs: | |
| - **Key:** `image` β Select an image file. | |
| - **Key:** `question` β Enter a text question related to the image. | |
| 6. Click **Send**. | |
| 7. View the response containing the model's answer in JSON format. | |
| --- | |
| ## Example Usage | |
| ### Streamlit Web App | |
| 1. Open the app in the browser. | |
| 2. Upload an image. | |
| 3. Enter a question. | |
| 4. View the model's response. | |
| ### Flask API | |
| 1. Send a `POST` request to `/vqa` with an image and a question. | |
| 2. Receive the model-generated answer in JSON format. | |
| --- | |
| ## Model Information | |
| - **Name:** `dandelin/vilt-b32-finetuned-vqa` | |
| - **Functionality:** Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA). | |
| - **Source:** [Hugging Face Model Hub](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) | |
| --- | |
| ## Contributing | |
| Feel free to contribute by opening issues or submitting pull requests. | |
| --- | |
| ## License | |
| This project is licensed under the MIT License. | |