Spaces:

Deva8
/

vqa-backend

Sleeping

App Files Files Community

vqa-backend / README.md

Deva8

Add HuggingFace configuration block to README

4de914d 5 days ago

preview code

raw

history blame contribute delete

9.25 kB

	---
	title: VQA Backend
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	---


	<div align="center">

	# GenVQA — Generative Visual Question Answering

	A neuro-symbolic VQA system that detects objects with a neural model, retrieves structured facts from Wikidata, and generates grounded answers with Groq.

	[![Backend CI](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/backend-ci.yml/badge.svg)](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/backend-ci.yml)
	[![UI CI](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/ui-ci.yml/badge.svg)](https://github.com/DevaRajan8/Generative-vqa/actions/workflows/ui-ci.yml)
	![Python](https://img.shields.io/badge/Python-3.10%2B-blue?logo=python)
	![License](https://img.shields.io/badge/License-MIT-green)

	</div>

	---

	## Architecture

	```
	┌─────────────────────────────────────────────────────────────┐
	│ CLIENT LAYER │
	│ 📱 Expo Mobile App (React Native) │
	│ • Image upload + question input │
	│ • Displays answer + accessibility description │
	└────────────────────────┬────────────────────────────────────┘
	│ HTTP POST /api/answer
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ BACKEND LAYER (FastAPI) │
	│ backend_api.py │
	│ • Request handling, session management │
	│ • Conversation Manager → multi-turn context tracking │
	└────────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ ROUTING LAYER (ensemble_vqa_app.py) │
	│ │
	│ CLIP encodes question → compares against: │
	│ "reasoning question" vs "visual/perceptual question" │
	│ │
	│ Reasoning? Visual? │
	│ │ │ │
	│ ▼ ▼ │
	│ ┌─────────────────┐ ┌─────────────────────┐ │
	│ │ NEURO-SYMBOLIC │ │ NEURAL VQA PATH │ │
	│ │ │ │ │ │
	│ │ 1. VQA model │ │ VQA model (GRU + │ │
	│ │ detects obj │ │ Attention) predicts │ │
	│ │ │ │ answer directly │ │
	│ │ 2. Wikidata API │ └──────────┬──────────┘ │
	│ │ fetches facts│ │ │
	│ │ (P31, P2101, │ │ │
	│ │ P2054, P186,│ │ │
	│ │ P366 ...) │ │ │
	│ │ │ │ │
	│ │ 3. Groq LLM │ │ │
	│ │ verbalizes │ │ │
	│ │ from facts │ │ │
	│ └─────────┬───────┘ │ │
	│ └──────────────┬──────────┘ │
	└────────────────────────── │ ─────────────────────────────┘
	│
	▼
	┌─────────────────┐
	│ GROQ SERVICE │
	│ Accessibility │
	│ description │
	│ (2 sentences, │
	│ screen-reader │
	│ friendly) │
	└────────┬────────┘
	│
	▼
	JSON response
	{ answer, model_used,
	kg_enhancement,
	wikidata_entity,
	description }
	```

	\| Layer \| Component \| Role \|
	\|---\|---\|---\|
	\| Client \| Expo React Native \| Image upload, question input, answer display \|
	\| API \| FastAPI (`backend_api.py`) \| Routing, sessions, conversation state \|
	\| Conversation \| `conversation_manager.py` \| Multi-turn context, history tracking \|
	\| Router \| CLIP (in `ensemble_vqa_app.py`) \| Classifies question as reasoning vs visual \|
	\| Neural VQA \| GRU + Attention (`model.py`) \| Answers visual questions directly from image \|
	\| Neuro-Symbolic \| `semantic_neurosymbolic_vqa.py` \| VQA detects objects → Wikidata fetches facts → Groq verbalizes \|
	\| Accessibility \| `groq_service.py` \| Generates spoken-friendly 2-sentence description for every answer \|

	---

	## Features

	- 🔍 Visual Question Answering — trained on VQAv2, fine-tuned on custom data
	- 🧠 Neuro-Symbolic Routing — CLIP semantically classifies questions as _reasoning_ vs _visual_, routes accordingly
	- 🌐 Live Wikidata Facts — queries physical properties, categories, materials, uses in real time
	- 🤖 Groq Verbalization — Llama 3.3 70B answers from structured facts, not hallucination
	- 💬 Conversational Support — multi-turn conversation manager with context tracking
	- 📱 Expo Mobile UI — React Native app for iOS/Android/Web
	- ♿ Accessibility — Groq generates spoken-friendly descriptions for every answer

	---

	## Quick Start

	### 1 — Backend

	```bash
	# Clone and install
	git clone https://github.com/DevaRajan8/Generative-vqa.git
	cd Generative-vqa
	pip install -r requirements_api.txt

	# Set your Groq API key
	cp .env.example .env
	# Edit .env → GROQ_API_KEY=your_key_here

	# Start API
	python backend_api.py
	# → http://localhost:8000
	```

	### 2 — Mobile UI

	```bash
	cd ui
	npm install
	npx expo start --clear
	```

	> Scan the QR code with Expo Go, or press `w` for browser.

	---

	## API

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `/api/answer` \| POST \| Answer a question about an uploaded image \|
	\| `/api/health` \| GET \| Health check \|
	\| `/api/conversation/new` \| POST \| Start a new conversation session \|

	Example:

	```bash
	curl -X POST http://localhost:8000/api/answer \
	-F "image=@photo.jpg" \
	-F "question=Can this melt?"
	```

	Response:

	```json
	{
	"answer": "ice",
	"model_used": "neuro-symbolic",
	"kg_enhancement": "Yes — ice can melt. [Wikidata P2101: melting point = 0.0 °C]",
	"knowledge_source": "VQA (neural) + Wikidata (symbolic) + Groq (verbalize)",
	"wikidata_entity": "Q86"
	}
	```

	---

	## Project Structure

	```
	├── backend_api.py # FastAPI server
	├── ensemble_vqa_app.py # VQA orchestrator (routing + inference)
	├── semantic_neurosymbolic_vqa.py # Wikidata KB + Groq verbalizer
	├── groq_service.py # Groq accessibility descriptions
	├── conversation_manager.py # Multi-turn conversation tracking
	├── model.py # VQA model definition
	├── train.py # Training pipeline
	├── ui/ # Expo React Native app
	│ └── src/screens/HomeScreen.js
	└── .github/
	├── workflows/ # CI — backend lint + UI build
	└── ISSUE_TEMPLATE/
	```

	---

	## Environment Variables

	\| Variable \| Required \| Description \|
	\|---\|---\|---\|
	\| `GROQ_API_KEY` \| ✅ \| Groq API key — [get one free](https://console.groq.com) \|
	\| `MODEL_PATH` \| optional \| Path to VQA checkpoint (default: `vqa_checkpoint.pt`) \|
	\| `PORT` \| optional \| API server port (default: `8000`) \|

	---

	## Requirements

	- Python 3.10+
	- CUDA GPU recommended (CPU works but is slow)
	- Node.js 20+ (for UI)
	- Groq API key (free tier available)

	---

	## License

	MIT © [DevaRajan8](https://github.com/DevaRajan8)