bigquery_metadata_generator / docs /troubleshooting.md
MrHoosk's picture
Add initial unit tests and utility modules for Schema Descriptor application
1521ef5

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade

Troubleshooting Guide

This guide covers common issues you might encounter when using Schema Descriptor and how to resolve them.

Installation Issues

Dependency Errors

Problem: Error messages about incompatible dependency versions.

Solution:

  • Follow the exact order of installation in the README.md
  • Install key dependencies individually with their exact versions before others:
    pip install protobuf==3.20.3
    pip install altair==4.2.2
    pip install streamlit==1.12.0
    pip install openai==0.28.0
    
  • Use --no-deps when installing the rest of the requirements

ModuleNotFoundError: No module named 'altair.vegalite.v4'

Problem: This error occurs when Streamlit tries to use Altair features but the wrong version is installed.

Solution:

  • Ensure you have exactly Altair 4.2.2 installed: pip install altair==4.2.2
  • Reinstall Streamlit after installing Altair: pip install streamlit==1.12.0

Import Errors with Google Cloud Libraries

Problem: Errors importing Google Cloud libraries or protobuf-related errors.

Solution:

  • Check that protobuf version is exactly 3.20.3
  • Check that the version of Google libraries match what's in requirements.txt
  • Try uninstalling and reinstalling the Google libraries in order

Authentication Issues

Google Cloud Authentication Errors

Problem: "Failed to authenticate with Google Cloud" or similar errors.

Solution:

  • Verify your service account key file is valid and not expired
  • Ensure the service account has the necessary BigQuery permissions
  • Check that you're using the correct project ID
  • Try authenticating with gcloud CLI separately to verify credentials

OpenAI API Errors

Problem: "Invalid API key" or "API key not found" errors.

Solution:

  • Verify your OpenAI API key is valid and not expired
  • Check if you've reached your API request limits
  • Ensure you're using the right key type (e.g., not using a test key in production)

Runtime Issues

Timeout When Processing Large Datasets

Problem: The application times out or fails when processing large datasets.

Solution:

  • Reduce the "Sample Size" parameter to sample fewer rows
  • Use date filters to process a smaller time range
  • Process tables individually instead of the entire dataset
  • Check BigQuery query quotas in your Google Cloud project

"Out of Memory" Errors

Problem: Streamlit crashes with memory-related errors.

Solution:

  • Process fewer tables at once
  • Reduce the "Maximum Parallel Tables" setting if available
  • Restart the application to clear the cache
  • Run Streamlit with more memory if possible

LLM Response Errors

Problem: OpenAI returns errors or incomplete responses.

Solution:

  • Check if responses exceed token limits
  • Verify your OpenAI account has API access
  • Review for any inappropriate content in your data samples
  • Try reducing the complexity of data being sent to the API

BigQuery Issues

Permission Denied When Writing Metadata

Problem: Errors when trying to update BigQuery metadata.

Solution:

  • Verify your service account has both read AND write permissions for BigQuery
  • Check specifically for bigquery.tables.update permissions
  • Ensure you're not trying to modify a dataset that's outside your project scope

"Table Not Found" or "Dataset Not Found" Errors

Problem: BigQuery can't find the tables or datasets you're trying to access.

Solution:

  • Check that the project ID and dataset ID are correct
  • Verify the tables actually exist in the specified dataset
  • Ensure your service account has access to the specific dataset
  • Check for typos in table or dataset names

Application Behavior Issues

Progress Gets Stuck

Problem: The progress indicator stops moving during processing.

Solution:

  • Check application logs for hidden errors
  • For very large tables, the sampling process might take a long time
  • Try refreshing the page or restarting the application
  • Reduce the sample size for better performance

Generated Descriptions Are Poor Quality

Problem: The LLM generates inaccurate or generic descriptions.

Solution:

  • Increase the sample size to give the LLM more context
  • Add specific instructions in the "Additional Instructions" field
  • Manually review and edit descriptions before committing
  • Check if sampling captured representative data from your tables

Still Having Issues?

If you encounter problems not covered in this guide:

  1. Check the console where you started the Streamlit application for detailed logs
  2. Review the DEPENDENCY_NOTES.md file for known issues
  3. Submit an issue on the GitHub repository with:
    • A clear description of the problem
    • Steps to reproduce
    • Complete error messages and logs
    • Your environment details (Python version, OS, etc.)