File size: 7,610 Bytes
bb8f662
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
# VQA Assistant - React Native Mobile App

A beautiful React Native mobile application for Visual Question Answering (VQA) using ensemble AI models with Google authentication.

## Features

- πŸ” **Google OAuth Authentication** - Secure sign-in with Google
- πŸ“Έ **Image Selection** - Pick from gallery or capture with camera
- πŸ€– **Ensemble VQA** - Automatic routing between base and spatial models
- 🎨 **Beautiful UI** - Modern gradient design with smooth animations
- ⚑ **Real-time Answers** - Fast question answering with model visualization
- πŸ“± **Cross-Platform** - Works on iOS and Android via Expo Go

## Architecture

### Backend
- **FastAPI** server wrapping the ensemble VQA system
- Two models:
  - **Base Model**: General VQA (39.4% accuracy)
  - **Spatial Model**: Spatial reasoning (28.5% accuracy)
- Automatic question routing based on spatial keywords

### Frontend
- **React Native** with Expo
- **Navigation**: React Navigation
- **State Management**: React Context API
- **UI Components**: Custom components with Material icons
- **Styling**: Custom theme with gradients

## Prerequisites

### Backend Requirements
- Python 3.8+
- CUDA-capable GPU (recommended) or CPU
- VQA model checkpoints:
  - `vqa_checkpoint.pt` (base model)
  - `vqa_spatial_checkpoint.pt` (spatial model)

### Frontend Requirements
- Node.js 16+
- npm or yarn
- Expo Go app on your mobile device
- Google Cloud OAuth credentials

## Setup Instructions

### 1. Backend Setup

```bash
# Navigate to project root
cd c:\Users\rdeva\Downloads\vqa_coes

# Install API dependencies
pip install -r requirements_api.txt

# Ensure model checkpoints are in the root directory
# - vqa_checkpoint.pt
# - vqa_spatial_checkpoint.pt

# Start the backend server
python backend_api.py
```

The backend will start on `http://0.0.0.0:8000`

**Important**: Note your computer's local IP address for mobile testing:
- Windows: Run `ipconfig` and find your IPv4 Address
- Mac/Linux: Run `ifconfig` or `ip addr`

### 2. Google OAuth Setup

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing
3. Enable **Google+ API**
4. Go to **Credentials** > **Create Credentials** > **OAuth 2.0 Client ID**
5. Create credentials for:
   - **Web application** (for Expo Go)
   - **iOS** (if building standalone iOS app)
   - **Android** (if building standalone Android app)
6. Update `ui/src/config/google.js` with your client IDs

### 3. Frontend Setup

```bash
# Navigate to UI folder
cd ui

# Install dependencies (already done)
npm install

# Update API configuration
# Edit ui/src/config/api.js and replace with your local IP:
# export const API_BASE_URL = 'http://YOUR_LOCAL_IP:8000';

# Update Google OAuth configuration
# Edit ui/src/config/google.js with your client IDs
```

### 4. Running the App

```bash
# Make sure backend is running first!

# Start Expo development server
npm start

# Or use specific platform
npm run android  # For Android
npm run ios      # For iOS (Mac only)
```

**Testing on Physical Device:**
1. Install **Expo Go** app from App Store or Play Store
2. Scan the QR code from the terminal
3. Ensure your phone and computer are on the same network
4. The app should load automatically

## Usage

### 1. Sign In
- Open the app
- Tap "Sign in with Google"
- Complete the OAuth flow
- You'll be redirected to the home screen

### 2. Ask Questions
1. **Select Image**:
   - Tap "Camera" to take a photo
   - Tap "Gallery" to choose from library
2. **Enter Question**:
   - Type your question in the text field
   - Examples:
     - "What color is the car?" (uses base model)
     - "What is to the right of the table?" (uses spatial model)
3. **Get Answer**:
   - Tap "Ask Question"
   - View the answer with model type indicator

### 3. Understanding Model Routing

The app automatically routes questions to the appropriate model:

**Spatial Model** (πŸ“) - Used for questions containing:
- Directional: right, left, above, below, top, bottom
- Positional: front, behind, next to, beside, near
- Relational: closest, farthest, nearest

**Base Model** (πŸ”) - Used for all other questions:
- Object identification
- Color questions
- Counting
- General descriptions

## Project Structure

```
ui/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ api.js              # API configuration
β”‚   β”‚   └── google.js           # Google OAuth config
β”‚   β”œβ”€β”€ contexts/
β”‚   β”‚   └── AuthContext.js      # Authentication state
β”‚   β”œβ”€β”€ screens/
β”‚   β”‚   β”œβ”€β”€ LoginScreen.js      # Login screen
β”‚   β”‚   └── HomeScreen.js       # Main VQA screen
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── api.js              # API client
β”‚   └── styles/
β”‚       β”œβ”€β”€ theme.js            # Theme configuration
β”‚       └── globalStyles.js     # Global styles
β”œβ”€β”€ App.js                      # Main app component
β”œβ”€β”€ app.json                    # Expo configuration
└── package.json                # Dependencies
```

## API Endpoints

### Backend API

- `GET /` - Root endpoint with API info
- `GET /health` - Health check
- `POST /api/answer` - Answer VQA question
  - Body: `multipart/form-data`
  - Fields: `image` (file), `question` (string)
  - Response: `{ answer, model_used, confidence, question_type }`
- `GET /api/models/info` - Get model information

## Configuration

### API Configuration (`ui/src/config/api.js`)
```javascript
export const API_BASE_URL = 'http://YOUR_LOCAL_IP:8000';
```

### Google OAuth (`ui/src/config/google.js`)
```javascript
export const GOOGLE_CONFIG = {
  webClientId: 'YOUR_WEB_CLIENT_ID.apps.googleusercontent.com',
  iosClientId: 'YOUR_IOS_CLIENT_ID.apps.googleusercontent.com',
  androidClientId: 'YOUR_ANDROID_CLIENT_ID.apps.googleusercontent.com',
};
```

## Troubleshooting

### Cannot Connect to Backend
- Ensure backend server is running (`python backend_api.py`)
- Check that `API_BASE_URL` in `ui/src/config/api.js` matches your local IP
- Verify phone and computer are on the same network
- Check firewall settings

### Google Login Not Working
- Verify OAuth credentials are correctly configured
- Check that redirect URI matches Expo configuration
- Ensure Google+ API is enabled in Cloud Console

### Image Upload Fails
- Check camera/gallery permissions
- Verify image size is reasonable (< 10MB)
- Check backend logs for errors

### Model Loading Issues
- Ensure checkpoint files are in the correct location
- Check GPU/CPU availability
- Verify all Python dependencies are installed

## Building for Production

### Android
```bash
eas build --platform android
```

### iOS
```bash
eas build --platform ios
```

Note: You'll need an Expo account and EAS CLI configured.

## Technologies Used

### Frontend
- React Native
- Expo
- React Navigation
- Expo Auth Session (Google OAuth)
- Expo Image Picker
- Axios
- React Native Paper
- Expo Linear Gradient

### Backend
- FastAPI
- Uvicorn
- PyTorch
- CLIP (OpenAI)
- GPT-2 (Hugging Face)
- Pillow

## Performance

- **Base Model**: 39.4% accuracy on general VQA
- **Spatial Model**: 28.5% accuracy on spatial questions
- **Inference Time**: ~2-5 seconds per question (GPU)
- **Model Size**: ~2GB total (both models)

## License

This project is for educational purposes.

## Support

For issues or questions:
1. Check the troubleshooting section
2. Review backend logs
3. Check Expo console for frontend errors

## Credits

- VQA Models: Custom ensemble system
- UI Design: Modern gradient aesthetic
- Icons: Material Community Icons
- Authentication: Google OAuth 2.0