vqa-backend / SETUP_GUIDE.md
Deva8's picture
Deploy VQA Space with model downloader
bb8f662
# VQA Accessibility Enhancement - Setup Guide
## Backend Setup
### 1. Install Python Dependencies
```bash
cd c:\Users\rdeva\Downloads\vqa_coes
pip install -r requirements_api.txt
```
### 2. Configure Groq API Key
1. Get your Groq API key from: https://console.groq.com/keys
2. Create a `.env` file in the project root:
```bash
copy .env.example .env
```
3. Edit `.env` and add your API key:
```
GROQ_API_KEY=your_actual_groq_api_key_here
```
### 3. Start Backend Server
```bash
python backend_api.py
```
The server will start on `http://localhost:8000`
---
## Frontend Setup
### 1. Install Node Dependencies
```bash
cd ui
npm install
```
This will install the new `expo-speech` package for text-to-speech functionality.
### 2. Start Expo App
```bash
npm start
```
Then:
- Press `a` for Android emulator
- Press `i` for iOS simulator
- Scan QR code with Expo Go app for physical device
---
## Testing the Features
### Image Display Fix
1. Open the app
2. Tap "Camera" or "Gallery" to select an image
3. **Expected**: Image should display correctly (no blank screen)
### LLM Description Feature
1. Upload an image
2. Enter a question (e.g., "What color is the car?")
3. Tap "Ask Question"
4. **Expected**:
- Original answer appears in the "Answer" card
- "Accessible Description" card appears below with 2-sentence description
- Speaker icon button is visible
### Text-to-Speech
1. After getting an answer with description
2. Tap the speaker icon (πŸ”Š) in the "Accessible Description" card
3. **Expected**: The description is read aloud
4. Tap the stop icon (⏹️) to stop playback
---
## Troubleshooting
### Backend Issues
**Groq API Key Error**
```
ValueError: Groq API key not found
```
**Solution**: Make sure `.env` file exists with `GROQ_API_KEY=your_key`
**Models Not Loading**
```
❌ Base checkpoint not found
```
**Solution**: Ensure `vqa_checkpoint.pt` and `vqa_spatial_checkpoint.pt` are in the project root
### Frontend Issues
**Image Not Displaying**
- Make sure you've run `npm install` to get the latest `expo-image` package
- Check console logs for image URI format issues
**Text-to-Speech Not Working**
- Ensure device volume is turned up
- Check that `expo-speech` package is installed
- On iOS simulator, speech may not work (test on physical device)
**Cannot Connect to Backend**
- Verify backend is running on port 8000
- Update `ui/src/config/api.js` with correct backend URL
- For physical devices, use ngrok or your computer's local IP
---
## Features Summary
βœ… **Fixed**: Image display issue (using expo-image instead of react-native Image)
βœ… **Added**: Groq LLM integration for 2-sentence descriptions
βœ… **Added**: Text-to-speech accessibility feature
βœ… **Added**: Visual distinction between raw answer and description
βœ… **Added**: Fallback mode when Groq API is unavailable