Spaces:
Runtime error
Runtime error
| title: Visual Question Answering (VQA) System | |
| emoji: 🏞️ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.43.1 | |
| app_file: app.py | |
| pinned: false | |
| # Visual Question Answering (VQA) System | |
| A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions. | |
| ## Features | |
| - Upload images in common formats (jpg, png, etc.) | |
| - Ask questions about image content in natural language | |
| - Get AI-generated answers based on image content | |
| - User-friendly Streamlit interface | |
| - Support for various types of questions (objects, attributes, counting, etc.) | |
| ## Technical Stack | |
| - **Python**: Main programming language | |
| - **PyTorch & Transformers**: Deep learning frameworks for running the models | |
| - **Streamlit**: Interactive web application framework | |
| - **HuggingFace Models**: Pre-trained visual question answering models | |
| - **PIL**: Image processing | |
| ## Setup Instructions | |
| 1. Clone this repository: | |
| ``` | |
| git clone | |
| cd visual-question-answering | |
| ``` | |
| 2. Create a virtual environment (recommended): | |
| ``` | |
| python -m venv venv | |
| # On Windows | |
| venv\Scripts\activate | |
| # On macOS/Linux | |
| source venv/bin/activate | |
| ``` | |
| 3. Install dependencies: | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| 4. Run the application: | |
| ``` | |
| python app.py | |
| ``` | |
| Or directly with Streamlit: | |
| ``` | |
| streamlit run app.py | |
| ``` | |
| 5. Open a web browser and go to `http://localhost:8501` | |
| ## Usage | |
| 1. Upload an image using the file upload area | |
| 2. Type your question about the image in the text field | |
| 3. Select a model from the sidebar (BLIP or ViLT) | |
| 4. Click "Get Answer" to get an AI-generated response | |
| 5. View the answer displayed on the right side of the screen | |
| ## Models Used | |
| This application uses the following pre-trained models from Hugging Face: | |
| - **BLIP**: For general visual question answering with free-form answers | |
| - **ViLT**: For detailed understanding of image content and yes/no questions | |
| ## Project Structure | |
| - `models/`: Contains model handling code | |
| - `utils/`: Utility functions for image processing and more | |
| - `static/`: Static files including uploaded images | |
| - `app.py`: Script to run the application | |
| - | |
| ## Acknowledgments | |
| - Hugging Face for their excellent pre-trained models | |
| - The open-source community for various libraries used in this project |