Spaces:

Deva8
/

vqa-backend

Sleeping

File size: 2,870 Bytes

bb8f662

# VQA Accessibility Enhancement - Setup Guide

## Backend Setup

### 1. Install Python Dependencies
```bash
cd c:\Users\rdeva\Downloads\vqa_coes
pip install -r requirements_api.txt
```

### 2. Configure Groq API Key

1. Get your Groq API key from: https://console.groq.com/keys
2. Create a `.env` file in the project root:
   ```bash
   copy .env.example .env
   ```
3. Edit `.env` and add your API key:
   ```
   GROQ_API_KEY=your_actual_groq_api_key_here
   ```

### 3. Start Backend Server
```bash
python backend_api.py
```

The server will start on `http://localhost:8000`

---

## Frontend Setup

### 1. Install Node Dependencies
```bash
cd ui
npm install
```

This will install the new `expo-speech` package for text-to-speech functionality.

### 2. Start Expo App
```bash
npm start
```

Then:
- Press `a` for Android emulator
- Press `i` for iOS simulator
- Scan QR code with Expo Go app for physical device

---

## Testing the Features

### Image Display Fix
1. Open the app
2. Tap "Camera" or "Gallery" to select an image
3. **Expected**: Image should display correctly (no blank screen)

### LLM Description Feature
1. Upload an image
2. Enter a question (e.g., "What color is the car?")
3. Tap "Ask Question"
4. **Expected**: 
   - Original answer appears in the "Answer" card
   - "Accessible Description" card appears below with 2-sentence description
   - Speaker icon button is visible

### Text-to-Speech
1. After getting an answer with description
2. Tap the speaker icon (🔊) in the "Accessible Description" card
3. **Expected**: The description is read aloud
4. Tap the stop icon (⏹️) to stop playback

---

## Troubleshooting

### Backend Issues

**Groq API Key Error**
```
ValueError: Groq API key not found
```
**Solution**: Make sure `.env` file exists with `GROQ_API_KEY=your_key`

**Models Not Loading**
```
❌ Base checkpoint not found
```
**Solution**: Ensure `vqa_checkpoint.pt` and `vqa_spatial_checkpoint.pt` are in the project root

### Frontend Issues

**Image Not Displaying**
- Make sure you've run `npm install` to get the latest `expo-image` package
- Check console logs for image URI format issues

**Text-to-Speech Not Working**
- Ensure device volume is turned up
- Check that `expo-speech` package is installed
- On iOS simulator, speech may not work (test on physical device)

**Cannot Connect to Backend**
- Verify backend is running on port 8000
- Update `ui/src/config/api.js` with correct backend URL
- For physical devices, use ngrok or your computer's local IP

---

## Features Summary

✅ **Fixed**: Image display issue (using expo-image instead of react-native Image)
✅ **Added**: Groq LLM integration for 2-sentence descriptions
✅ **Added**: Text-to-speech accessibility feature
✅ **Added**: Visual distinction between raw answer and description
✅ **Added**: Fallback mode when Groq API is unavailable