bigquery_metadata_generator / DEPENDENCY_NOTES.md
MrHoosk's picture
Add initial unit tests and utility modules for Schema Descriptor application
1521ef5

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade

Dependency Management Notes

This document outlines known dependency constraints and issues in the Schema Descriptor project. It's intended for maintainers and contributors to understand the dependency landscape.

Critical Dependencies

Protobuf (3.20.3)

This project requires protobuf version 3.20.3 due to several conflicting constraints:

  • Streamlit 1.12.0 requires protobuf < 4.0.0
  • Google Cloud BigQuery libraries require protobuf >= 3.19.5
  • gRPC-status requires protobuf >= 4.21.6 (which we've worked around)

Using any other version of protobuf will cause dependency conflicts:

  • Lower versions won't work with Google Cloud libraries
  • Higher versions (4.x) won't work with Streamlit

Altair (4.2.2)

Altair must be exactly version 4.2.2 because:

  • Streamlit 1.12.0 depends on the altair.vegalite.v4 module
  • Newer versions of Altair (>= 5.0.0) removed this module structure
  • This causes ModuleNotFoundError: No module named 'altair.vegalite.v4' errors

Streamlit (1.12.0)

The application is built with Streamlit 1.12.0. Upgrading to newer versions would:

  • Require extensive refactoring
  • Potentially solve some dependency issues
  • But introduce backwards compatibility concerns

OpenAI (0.28.0)

OpenAI version 0.28.0 is used because:

  • It has a compatible API with our LLM service integration
  • Newer client libraries (>=1.0.0) have completely different APIs

Dependency Resolution Strategy

The following installation order helps resolve dependency conflicts:

  1. Install protobuf first: pip install protobuf==3.20.3
  2. Install Altair: pip install altair==4.2.2
  3. Install Streamlit: pip install streamlit==1.12.0
  4. Install OpenAI: pip install openai==0.28.0
  5. Install remaining dependencies: pip install -r requirements.txt --no-deps

CI/CD Pipeline

In the GitHub Actions workflow:

  1. We explicitly install test dependencies first
  2. Then install key dependencies in the correct order
  3. Use --no-deps to avoid dependency resolution issues
  4. Skip tests that are known to fail due to dependency mocking issues

Future Improvements

Potential improvements to dependency management:

  1. Upgrade Streamlit: Moving to a newer version would resolve several issues but require code changes
  2. Use Docker: Containerization would provide a more consistent environment
  3. Split Requirements: Create separate requirement files for main, dev, and test dependencies
  4. Fix Test Mocks: Update tests to work better with the current dependency constraints

Known Dependency-Related Issues

  1. Test failures: Some tests fail when mocking dependencies due to strict type checking
  2. Install errors: Non-ordered installation may lead to dependency resolution failures
  3. Version control: Pin exact versions vs compatible versions (== vs ~=) trade-off