File size: 2,870 Bytes
bb8f662
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# VQA Accessibility Enhancement - Setup Guide

## Backend Setup

### 1. Install Python Dependencies
```bash
cd c:\Users\rdeva\Downloads\vqa_coes
pip install -r requirements_api.txt
```

### 2. Configure Groq API Key

1. Get your Groq API key from: https://console.groq.com/keys
2. Create a `.env` file in the project root:
   ```bash
   copy .env.example .env
   ```
3. Edit `.env` and add your API key:
   ```
   GROQ_API_KEY=your_actual_groq_api_key_here
   ```

### 3. Start Backend Server
```bash
python backend_api.py
```

The server will start on `http://localhost:8000`

---

## Frontend Setup

### 1. Install Node Dependencies
```bash
cd ui
npm install
```

This will install the new `expo-speech` package for text-to-speech functionality.

### 2. Start Expo App
```bash
npm start
```

Then:
- Press `a` for Android emulator
- Press `i` for iOS simulator
- Scan QR code with Expo Go app for physical device

---

## Testing the Features

### Image Display Fix
1. Open the app
2. Tap "Camera" or "Gallery" to select an image
3. **Expected**: Image should display correctly (no blank screen)

### LLM Description Feature
1. Upload an image
2. Enter a question (e.g., "What color is the car?")
3. Tap "Ask Question"
4. **Expected**: 
   - Original answer appears in the "Answer" card
   - "Accessible Description" card appears below with 2-sentence description
   - Speaker icon button is visible

### Text-to-Speech
1. After getting an answer with description
2. Tap the speaker icon (πŸ”Š) in the "Accessible Description" card
3. **Expected**: The description is read aloud
4. Tap the stop icon (⏹️) to stop playback

---

## Troubleshooting

### Backend Issues

**Groq API Key Error**
```
ValueError: Groq API key not found
```
**Solution**: Make sure `.env` file exists with `GROQ_API_KEY=your_key`

**Models Not Loading**
```
❌ Base checkpoint not found
```
**Solution**: Ensure `vqa_checkpoint.pt` and `vqa_spatial_checkpoint.pt` are in the project root

### Frontend Issues

**Image Not Displaying**
- Make sure you've run `npm install` to get the latest `expo-image` package
- Check console logs for image URI format issues

**Text-to-Speech Not Working**
- Ensure device volume is turned up
- Check that `expo-speech` package is installed
- On iOS simulator, speech may not work (test on physical device)

**Cannot Connect to Backend**
- Verify backend is running on port 8000
- Update `ui/src/config/api.js` with correct backend URL
- For physical devices, use ngrok or your computer's local IP

---

## Features Summary

βœ… **Fixed**: Image display issue (using expo-image instead of react-native Image)
βœ… **Added**: Groq LLM integration for 2-sentence descriptions
βœ… **Added**: Text-to-speech accessibility feature
βœ… **Added**: Visual distinction between raw answer and description
βœ… **Added**: Fallback mode when Groq API is unavailable