Spaces:
Sleeping
Sleeping
| # VQA Accessibility Enhancement - Setup Guide | |
| ## Backend Setup | |
| ### 1. Install Python Dependencies | |
| ```bash | |
| cd c:\Users\rdeva\Downloads\vqa_coes | |
| pip install -r requirements_api.txt | |
| ``` | |
| ### 2. Configure Groq API Key | |
| 1. Get your Groq API key from: https://console.groq.com/keys | |
| 2. Create a `.env` file in the project root: | |
| ```bash | |
| copy .env.example .env | |
| ``` | |
| 3. Edit `.env` and add your API key: | |
| ``` | |
| GROQ_API_KEY=your_actual_groq_api_key_here | |
| ``` | |
| ### 3. Start Backend Server | |
| ```bash | |
| python backend_api.py | |
| ``` | |
| The server will start on `http://localhost:8000` | |
| --- | |
| ## Frontend Setup | |
| ### 1. Install Node Dependencies | |
| ```bash | |
| cd ui | |
| npm install | |
| ``` | |
| This will install the new `expo-speech` package for text-to-speech functionality. | |
| ### 2. Start Expo App | |
| ```bash | |
| npm start | |
| ``` | |
| Then: | |
| - Press `a` for Android emulator | |
| - Press `i` for iOS simulator | |
| - Scan QR code with Expo Go app for physical device | |
| --- | |
| ## Testing the Features | |
| ### Image Display Fix | |
| 1. Open the app | |
| 2. Tap "Camera" or "Gallery" to select an image | |
| 3. **Expected**: Image should display correctly (no blank screen) | |
| ### LLM Description Feature | |
| 1. Upload an image | |
| 2. Enter a question (e.g., "What color is the car?") | |
| 3. Tap "Ask Question" | |
| 4. **Expected**: | |
| - Original answer appears in the "Answer" card | |
| - "Accessible Description" card appears below with 2-sentence description | |
| - Speaker icon button is visible | |
| ### Text-to-Speech | |
| 1. After getting an answer with description | |
| 2. Tap the speaker icon (π) in the "Accessible Description" card | |
| 3. **Expected**: The description is read aloud | |
| 4. Tap the stop icon (βΉοΈ) to stop playback | |
| --- | |
| ## Troubleshooting | |
| ### Backend Issues | |
| **Groq API Key Error** | |
| ``` | |
| ValueError: Groq API key not found | |
| ``` | |
| **Solution**: Make sure `.env` file exists with `GROQ_API_KEY=your_key` | |
| **Models Not Loading** | |
| ``` | |
| β Base checkpoint not found | |
| ``` | |
| **Solution**: Ensure `vqa_checkpoint.pt` and `vqa_spatial_checkpoint.pt` are in the project root | |
| ### Frontend Issues | |
| **Image Not Displaying** | |
| - Make sure you've run `npm install` to get the latest `expo-image` package | |
| - Check console logs for image URI format issues | |
| **Text-to-Speech Not Working** | |
| - Ensure device volume is turned up | |
| - Check that `expo-speech` package is installed | |
| - On iOS simulator, speech may not work (test on physical device) | |
| **Cannot Connect to Backend** | |
| - Verify backend is running on port 8000 | |
| - Update `ui/src/config/api.js` with correct backend URL | |
| - For physical devices, use ngrok or your computer's local IP | |
| --- | |
| ## Features Summary | |
| β **Fixed**: Image display issue (using expo-image instead of react-native Image) | |
| β **Added**: Groq LLM integration for 2-sentence descriptions | |
| β **Added**: Text-to-speech accessibility feature | |
| β **Added**: Visual distinction between raw answer and description | |
| β **Added**: Fallback mode when Groq API is unavailable | |