bigquery_metadata_generator / docs /troubleshooting.md
MrHoosk's picture
Add initial unit tests and utility modules for Schema Descriptor application
1521ef5
# Troubleshooting Guide
This guide covers common issues you might encounter when using Schema Descriptor and how to resolve them.
## Installation Issues
### Dependency Errors
**Problem**: Error messages about incompatible dependency versions.
**Solution**:
- Follow the exact order of installation in the [README.md](../README.md)
- Install key dependencies individually with their exact versions before others:
```
pip install protobuf==3.20.3
pip install altair==4.2.2
pip install streamlit==1.12.0
pip install openai==0.28.0
```
- Use `--no-deps` when installing the rest of the requirements
### ModuleNotFoundError: No module named 'altair.vegalite.v4'
**Problem**: This error occurs when Streamlit tries to use Altair features but the wrong version is installed.
**Solution**:
- Ensure you have exactly Altair 4.2.2 installed: `pip install altair==4.2.2`
- Reinstall Streamlit after installing Altair: `pip install streamlit==1.12.0`
### Import Errors with Google Cloud Libraries
**Problem**: Errors importing Google Cloud libraries or protobuf-related errors.
**Solution**:
- Check that protobuf version is exactly 3.20.3
- Check that the version of Google libraries match what's in requirements.txt
- Try uninstalling and reinstalling the Google libraries in order
## Authentication Issues
### Google Cloud Authentication Errors
**Problem**: "Failed to authenticate with Google Cloud" or similar errors.
**Solution**:
- Verify your service account key file is valid and not expired
- Ensure the service account has the necessary BigQuery permissions
- Check that you're using the correct project ID
- Try authenticating with gcloud CLI separately to verify credentials
### OpenAI API Errors
**Problem**: "Invalid API key" or "API key not found" errors.
**Solution**:
- Verify your OpenAI API key is valid and not expired
- Check if you've reached your API request limits
- Ensure you're using the right key type (e.g., not using a test key in production)
## Runtime Issues
### Timeout When Processing Large Datasets
**Problem**: The application times out or fails when processing large datasets.
**Solution**:
- Reduce the "Sample Size" parameter to sample fewer rows
- Use date filters to process a smaller time range
- Process tables individually instead of the entire dataset
- Check BigQuery query quotas in your Google Cloud project
### "Out of Memory" Errors
**Problem**: Streamlit crashes with memory-related errors.
**Solution**:
- Process fewer tables at once
- Reduce the "Maximum Parallel Tables" setting if available
- Restart the application to clear the cache
- Run Streamlit with more memory if possible
### LLM Response Errors
**Problem**: OpenAI returns errors or incomplete responses.
**Solution**:
- Check if responses exceed token limits
- Verify your OpenAI account has API access
- Review for any inappropriate content in your data samples
- Try reducing the complexity of data being sent to the API
## BigQuery Issues
### Permission Denied When Writing Metadata
**Problem**: Errors when trying to update BigQuery metadata.
**Solution**:
- Verify your service account has both read AND write permissions for BigQuery
- Check specifically for `bigquery.tables.update` permissions
- Ensure you're not trying to modify a dataset that's outside your project scope
### "Table Not Found" or "Dataset Not Found" Errors
**Problem**: BigQuery can't find the tables or datasets you're trying to access.
**Solution**:
- Check that the project ID and dataset ID are correct
- Verify the tables actually exist in the specified dataset
- Ensure your service account has access to the specific dataset
- Check for typos in table or dataset names
## Application Behavior Issues
### Progress Gets Stuck
**Problem**: The progress indicator stops moving during processing.
**Solution**:
- Check application logs for hidden errors
- For very large tables, the sampling process might take a long time
- Try refreshing the page or restarting the application
- Reduce the sample size for better performance
### Generated Descriptions Are Poor Quality
**Problem**: The LLM generates inaccurate or generic descriptions.
**Solution**:
- Increase the sample size to give the LLM more context
- Add specific instructions in the "Additional Instructions" field
- Manually review and edit descriptions before committing
- Check if sampling captured representative data from your tables
## Still Having Issues?
If you encounter problems not covered in this guide:
1. Check the console where you started the Streamlit application for detailed logs
2. Review the [DEPENDENCY_NOTES.md](../DEPENDENCY_NOTES.md) file for known issues
3. Submit an issue on the GitHub repository with:
- A clear description of the problem
- Steps to reproduce
- Complete error messages and logs
- Your environment details (Python version, OS, etc.)