---
title: VQA Backend
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# GenVQA — Generative Visual Question Answering
**A neuro-symbolic VQA system that detects objects with a neural model, retrieves structured facts from Wikidata, and generates grounded answers with Groq.**
[](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/backend-ci.yml)
[](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/ui-ci.yml)


---
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ 📱 Expo Mobile App (React Native) │
│ • Image upload + question input │
│ • Displays answer + accessibility description │
└────────────────────────┬────────────────────────────────────┘
│ HTTP POST /api/answer
▼
┌─────────────────────────────────────────────────────────────┐
│ BACKEND LAYER (FastAPI) │
│ backend_api.py │
│ • Request handling, session management │
│ • Conversation Manager → multi-turn context tracking │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ROUTING LAYER (ensemble_vqa_app.py) │
│ │
│ CLIP encodes question → compares against: │
│ "reasoning question" vs "visual/perceptual question" │
│ │
│ Reasoning? Visual? │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ NEURO-SYMBOLIC │ │ NEURAL VQA PATH │ │
│ │ │ │ │ │
│ │ 1. VQA model │ │ VQA model (GRU + │ │
│ │ detects obj │ │ Attention) predicts │ │
│ │ │ │ answer directly │ │
│ │ 2. Wikidata API │ └──────────┬──────────┘ │
│ │ fetches facts│ │ │
│ │ (P31, P2101, │ │ │
│ │ P2054, P186,│ │ │
│ │ P366 ...) │ │ │
│ │ │ │ │
│ │ 3. Groq LLM │ │ │
│ │ verbalizes │ │ │
│ │ from facts │ │ │
│ └─────────┬───────┘ │ │
│ └──────────────┬──────────┘ │
└────────────────────────── │ ─────────────────────────────┘
│
▼
┌─────────────────┐
│ GROQ SERVICE │
│ Accessibility │
│ description │
│ (2 sentences, │
│ screen-reader │
│ friendly) │
└────────┬────────┘
│
▼
JSON response
{ answer, model_used,
kg_enhancement,
wikidata_entity,
description }
```
| Layer | Component | Role |
|---|---|---|
| **Client** | Expo React Native | Image upload, question input, answer display |
| **API** | FastAPI (`backend_api.py`) | Routing, sessions, conversation state |
| **Conversation** | `conversation_manager.py` | Multi-turn context, history tracking |
| **Router** | CLIP (in `ensemble_vqa_app.py`) | Classifies question as reasoning vs visual |
| **Neural VQA** | GRU + Attention (`model.py`) | Answers visual questions directly from image |
| **Neuro-Symbolic** | `semantic_neurosymbolic_vqa.py` | VQA detects objects → Wikidata fetches facts → Groq verbalizes |
| **Accessibility** | `groq_service.py` | Generates spoken-friendly 2-sentence description for every answer |
---
## Features
- 🔍 **Visual Question Answering** — trained on VQAv2, fine-tuned on custom data
- 🧠 **Neuro-Symbolic Routing** — CLIP semantically classifies questions as _reasoning_ vs _visual_, routes accordingly
- 🌐 **Live Wikidata Facts** — queries physical properties, categories, materials, uses in real time
- 🤖 **Groq Verbalization** — Llama 3.3 70B answers from structured facts, not hallucination
- 💬 **Conversational Support** — multi-turn conversation manager with context tracking
- 📱 **Expo Mobile UI** — React Native app for iOS/Android/Web
- ♿ **Accessibility** — Groq generates spoken-friendly descriptions for every answer
---
## Quick Start
### 1 — Backend
```bash
# Clone and install
git clone https://github.com/DevaRajan8/Generative-vqa.git
cd Generative-vqa
pip install -r requirements_api.txt
# Set your Groq API key
cp .env.example .env
# Edit .env → GROQ_API_KEY=your_key_here
# Start API
python backend_api.py
# → http://localhost:8000
```
### 2 — Mobile UI
```bash
cd ui
npm install
npx expo start --clear
```
> Scan the QR code with Expo Go, or press `w` for browser.
---
## API
| Endpoint | Method | Description |
|---|---|---|
| `/api/answer` | POST | Answer a question about an uploaded image |
| `/api/health` | GET | Health check |
| `/api/conversation/new` | POST | Start a new conversation session |
**Example:**
```bash
curl -X POST http://localhost:8000/api/answer \
-F "image=@photo.jpg" \
-F "question=Can this melt?"
```
**Response:**
```json
{
"answer": "ice",
"model_used": "neuro-symbolic",
"kg_enhancement": "Yes — ice can melt. [Wikidata P2101: melting point = 0.0 °C]",
"knowledge_source": "VQA (neural) + Wikidata (symbolic) + Groq (verbalize)",
"wikidata_entity": "Q86"
}
```
---
## Project Structure
```
├── backend_api.py # FastAPI server
├── ensemble_vqa_app.py # VQA orchestrator (routing + inference)
├── semantic_neurosymbolic_vqa.py # Wikidata KB + Groq verbalizer
├── groq_service.py # Groq accessibility descriptions
├── conversation_manager.py # Multi-turn conversation tracking
├── model.py # VQA model definition
├── train.py # Training pipeline
├── ui/ # Expo React Native app
│ └── src/screens/HomeScreen.js
└── .github/
├── workflows/ # CI — backend lint + UI build
└── ISSUE_TEMPLATE/
```
---
## Environment Variables
| Variable | Required | Description |
|---|---|---|
| `GROQ_API_KEY` | ✅ | Groq API key — [get one free](https://console.groq.com) |
| `MODEL_PATH` | optional | Path to VQA checkpoint (default: `vqa_checkpoint.pt`) |
| `PORT` | optional | API server port (default: `8000`) |
---
## Requirements
- Python 3.10+
- CUDA GPU recommended (CPU works but is slow)
- Node.js 20+ (for UI)
- Groq API key (free tier available)
---
## License
MIT © [DevaRajan8](https://github.com/DevaRajan8)