| --- |
| title: VQA |
| emoji: π |
| colorFrom: gray |
| colorTo: yellow |
| sdk: docker |
| pinned: false |
| license: mit |
| short_description: VQA API Endpoint |
| --- |
| |
| Check out the configuration reference at |
| https://huggingface.co/docs/hub/spaces-config-reference |
|
|
| # VizWiz Visual Question Answering API |
|
|
| This repository contains a FastAPI backend for a Visual Question Answering (VQA) |
| system trained on the VizWiz dataset. |
|
|
| ## Features |
|
|
| - Upload images and ask questions about them |
| - Get answers with confidence scores |
| - Session management for asking multiple questions about the same image |
| - Health check endpoint for monitoring |
| - API documentation with Swagger UI |
|
|
| ## Project Structure |
|
|
| ``` |
| project_root/ |
| βββ app/ |
| β βββ main.py # Main FastAPI application |
| β βββ models/ # Model definitions |
| β β βββ __init__.py |
| β β βββ vqa_model.py # VQA model implementation |
| β βββ routers/ # API route definitions |
| β β βββ __init__.py |
| β β βββ vqa.py # VQA-related endpoints |
| β βββ services/ # Business logic |
| β β βββ __init__.py |
| β β βββ model_service.py # Model loading and inference |
| β β βββ session_service.py # Session management |
| β βββ utils/ # Utility functions |
| β β βββ __init__.py |
| β β βββ image_utils.py # Image processing utilities |
| β βββ config.py # Application configuration |
| βββ models/ # Directory for model files |
| βββ uploads/ # Directory for uploaded images |
| βββ .env # Environment variables |
| βββ requirements.txt # Project dependencies |
| ``` |
|
|
| ## Installation |
|
|
| 1. Clone the repository: |
|
|
| ```bash |
| git clone https://github.com/dixisouls/vizwiz-vqa-api.git |
| cd vizwiz-vqa-api |
| ``` |
|
|
| 2. Create a virtual environment: |
|
|
| ```bash |
| python -m venv venv |
| source venv/bin/activate # On Windows: venv\Scripts\activate |
| ``` |
|
|
| 3. Install dependencies: |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| 4. Create necessary directories: |
|
|
| ```bash |
| mkdir -p models uploads |
| ``` |
|
|
| 5. Place your trained model in the `models` directory. |
|
|
| 6. Update the `.env` file with your configuration. |
|
|
| ## Running the Application |
|
|
| ```bash |
| uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload |
| ``` |
|
|
| The API will be available at http://localhost:8000. |
|
|
| API documentation is available at: |
|
|
| - Swagger UI: http://localhost:8000/docs |
| - ReDoc: http://localhost:8000/redoc |
|
|
| ## API Endpoints |
|
|
| ### Health Check |
|
|
| ``` |
| GET /health |
| ``` |
|
|
| Returns the health status of the API. |
|
|
| ### Upload Image |
|
|
| ``` |
| POST /api/vqa/upload |
| ``` |
|
|
| Upload an image and create a new session. |
|
|
| ### Ask Question |
|
|
| ``` |
| POST /api/vqa/ask |
| ``` |
|
|
| Ask a question about an uploaded image. |
|
|
| ### Get Session |
|
|
| ``` |
| GET /api/vqa/session/{session_id} |
| ``` |
|
|
| Get session information including question history. |
|
|
| ### Reset Session |
|
|
| ``` |
| DELETE /api/vqa/session/{session_id} |
| ``` |
|
|
| Reset a session to start fresh. |
|
|
| ## Environment Variables |
|
|
| - `DEBUG`: Enable debug mode (default: False) |
| - `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt) |
| - `TEXT_MODEL`: Name of the text model (default: bert-base-uncased) |
| - `VISION_MODEL`: Name of the vision model (default: |
| google/vit-base-patch16-384) |
| - `HUGGINGFACE_TOKEN`: Hugging Face API token |
| - `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads) |
|
|
| ## License |
|
|
| [MIT License](LICENSE) |
|
|