Spaces:
Sleeping
Sleeping
VQA Assistant - React Native Mobile App
A beautiful React Native mobile application for Visual Question Answering (VQA) using ensemble AI models with Google authentication.
Features
- π Google OAuth Authentication - Secure sign-in with Google
- πΈ Image Selection - Pick from gallery or capture with camera
- π€ Ensemble VQA - Automatic routing between base and spatial models
- π¨ Beautiful UI - Modern gradient design with smooth animations
- β‘ Real-time Answers - Fast question answering with model visualization
- π± Cross-Platform - Works on iOS and Android via Expo Go
Architecture
Backend
- FastAPI server wrapping the ensemble VQA system
- Two models:
- Base Model: General VQA (39.4% accuracy)
- Spatial Model: Spatial reasoning (28.5% accuracy)
- Automatic question routing based on spatial keywords
Frontend
- React Native with Expo
- Navigation: React Navigation
- State Management: React Context API
- UI Components: Custom components with Material icons
- Styling: Custom theme with gradients
Prerequisites
Backend Requirements
- Python 3.8+
- CUDA-capable GPU (recommended) or CPU
- VQA model checkpoints:
vqa_checkpoint.pt(base model)vqa_spatial_checkpoint.pt(spatial model)
Frontend Requirements
- Node.js 16+
- npm or yarn
- Expo Go app on your mobile device
- Google Cloud OAuth credentials
Setup Instructions
1. Backend Setup
# Navigate to project root
cd c:\Users\rdeva\Downloads\vqa_coes
# Install API dependencies
pip install -r requirements_api.txt
# Ensure model checkpoints are in the root directory
# - vqa_checkpoint.pt
# - vqa_spatial_checkpoint.pt
# Start the backend server
python backend_api.py
The backend will start on http://0.0.0.0:8000
Important: Note your computer's local IP address for mobile testing:
- Windows: Run
ipconfigand find your IPv4 Address - Mac/Linux: Run
ifconfigorip addr
2. Google OAuth Setup
- Go to Google Cloud Console
- Create a new project or select existing
- Enable Google+ API
- Go to Credentials > Create Credentials > OAuth 2.0 Client ID
- Create credentials for:
- Web application (for Expo Go)
- iOS (if building standalone iOS app)
- Android (if building standalone Android app)
- Update
ui/src/config/google.jswith your client IDs
3. Frontend Setup
# Navigate to UI folder
cd ui
# Install dependencies (already done)
npm install
# Update API configuration
# Edit ui/src/config/api.js and replace with your local IP:
# export const API_BASE_URL = 'http://YOUR_LOCAL_IP:8000';
# Update Google OAuth configuration
# Edit ui/src/config/google.js with your client IDs
4. Running the App
# Make sure backend is running first!
# Start Expo development server
npm start
# Or use specific platform
npm run android # For Android
npm run ios # For iOS (Mac only)
Testing on Physical Device:
- Install Expo Go app from App Store or Play Store
- Scan the QR code from the terminal
- Ensure your phone and computer are on the same network
- The app should load automatically
Usage
1. Sign In
- Open the app
- Tap "Sign in with Google"
- Complete the OAuth flow
- You'll be redirected to the home screen
2. Ask Questions
- Select Image:
- Tap "Camera" to take a photo
- Tap "Gallery" to choose from library
- Enter Question:
- Type your question in the text field
- Examples:
- "What color is the car?" (uses base model)
- "What is to the right of the table?" (uses spatial model)
- Get Answer:
- Tap "Ask Question"
- View the answer with model type indicator
3. Understanding Model Routing
The app automatically routes questions to the appropriate model:
Spatial Model (π) - Used for questions containing:
- Directional: right, left, above, below, top, bottom
- Positional: front, behind, next to, beside, near
- Relational: closest, farthest, nearest
Base Model (π) - Used for all other questions:
- Object identification
- Color questions
- Counting
- General descriptions
Project Structure
ui/
βββ src/
β βββ config/
β β βββ api.js # API configuration
β β βββ google.js # Google OAuth config
β βββ contexts/
β β βββ AuthContext.js # Authentication state
β βββ screens/
β β βββ LoginScreen.js # Login screen
β β βββ HomeScreen.js # Main VQA screen
β βββ services/
β β βββ api.js # API client
β βββ styles/
β βββ theme.js # Theme configuration
β βββ globalStyles.js # Global styles
βββ App.js # Main app component
βββ app.json # Expo configuration
βββ package.json # Dependencies
API Endpoints
Backend API
GET /- Root endpoint with API infoGET /health- Health checkPOST /api/answer- Answer VQA question- Body:
multipart/form-data - Fields:
image(file),question(string) - Response:
{ answer, model_used, confidence, question_type }
- Body:
GET /api/models/info- Get model information
Configuration
API Configuration (ui/src/config/api.js)
export const API_BASE_URL = 'http://YOUR_LOCAL_IP:8000';
Google OAuth (ui/src/config/google.js)
export const GOOGLE_CONFIG = {
webClientId: 'YOUR_WEB_CLIENT_ID.apps.googleusercontent.com',
iosClientId: 'YOUR_IOS_CLIENT_ID.apps.googleusercontent.com',
androidClientId: 'YOUR_ANDROID_CLIENT_ID.apps.googleusercontent.com',
};
Troubleshooting
Cannot Connect to Backend
- Ensure backend server is running (
python backend_api.py) - Check that
API_BASE_URLinui/src/config/api.jsmatches your local IP - Verify phone and computer are on the same network
- Check firewall settings
Google Login Not Working
- Verify OAuth credentials are correctly configured
- Check that redirect URI matches Expo configuration
- Ensure Google+ API is enabled in Cloud Console
Image Upload Fails
- Check camera/gallery permissions
- Verify image size is reasonable (< 10MB)
- Check backend logs for errors
Model Loading Issues
- Ensure checkpoint files are in the correct location
- Check GPU/CPU availability
- Verify all Python dependencies are installed
Building for Production
Android
eas build --platform android
iOS
eas build --platform ios
Note: You'll need an Expo account and EAS CLI configured.
Technologies Used
Frontend
- React Native
- Expo
- React Navigation
- Expo Auth Session (Google OAuth)
- Expo Image Picker
- Axios
- React Native Paper
- Expo Linear Gradient
Backend
- FastAPI
- Uvicorn
- PyTorch
- CLIP (OpenAI)
- GPT-2 (Hugging Face)
- Pillow
Performance
- Base Model: 39.4% accuracy on general VQA
- Spatial Model: 28.5% accuracy on spatial questions
- Inference Time: ~2-5 seconds per question (GPU)
- Model Size: ~2GB total (both models)
License
This project is for educational purposes.
Support
For issues or questions:
- Check the troubleshooting section
- Review backend logs
- Check Expo console for frontend errors
Credits
- VQA Models: Custom ensemble system
- UI Design: Modern gradient aesthetic
- Icons: Material Community Icons
- Authentication: Google OAuth 2.0