bigquery_metadata_generator / DEPENDENCY_NOTES.md
MrHoosk's picture
Add initial unit tests and utility modules for Schema Descriptor application
1521ef5
# Dependency Management Notes
This document outlines known dependency constraints and issues in the Schema Descriptor project. It's intended for maintainers and contributors to understand the dependency landscape.
## Critical Dependencies
### Protobuf (3.20.3)
This project requires protobuf version 3.20.3 due to several conflicting constraints:
- Streamlit 1.12.0 requires protobuf < 4.0.0
- Google Cloud BigQuery libraries require protobuf >= 3.19.5
- gRPC-status requires protobuf >= 4.21.6 (which we've worked around)
Using any other version of protobuf will cause dependency conflicts:
- Lower versions won't work with Google Cloud libraries
- Higher versions (4.x) won't work with Streamlit
### Altair (4.2.2)
Altair must be exactly version 4.2.2 because:
- Streamlit 1.12.0 depends on the `altair.vegalite.v4` module
- Newer versions of Altair (>= 5.0.0) removed this module structure
- This causes `ModuleNotFoundError: No module named 'altair.vegalite.v4'` errors
### Streamlit (1.12.0)
The application is built with Streamlit 1.12.0. Upgrading to newer versions would:
- Require extensive refactoring
- Potentially solve some dependency issues
- But introduce backwards compatibility concerns
### OpenAI (0.28.0)
OpenAI version 0.28.0 is used because:
- It has a compatible API with our LLM service integration
- Newer client libraries (>=1.0.0) have completely different APIs
## Dependency Resolution Strategy
The following installation order helps resolve dependency conflicts:
1. Install protobuf first: `pip install protobuf==3.20.3`
2. Install Altair: `pip install altair==4.2.2`
3. Install Streamlit: `pip install streamlit==1.12.0`
4. Install OpenAI: `pip install openai==0.28.0`
5. Install remaining dependencies: `pip install -r requirements.txt --no-deps`
## CI/CD Pipeline
In the GitHub Actions workflow:
1. We explicitly install test dependencies first
2. Then install key dependencies in the correct order
3. Use `--no-deps` to avoid dependency resolution issues
4. Skip tests that are known to fail due to dependency mocking issues
## Future Improvements
Potential improvements to dependency management:
1. **Upgrade Streamlit**: Moving to a newer version would resolve several issues but require code changes
2. **Use Docker**: Containerization would provide a more consistent environment
3. **Split Requirements**: Create separate requirement files for main, dev, and test dependencies
4. **Fix Test Mocks**: Update tests to work better with the current dependency constraints
## Known Dependency-Related Issues
1. **Test failures**: Some tests fail when mocking dependencies due to strict type checking
2. **Install errors**: Non-ordered installation may lead to dependency resolution failures
3. **Version control**: Pin exact versions vs compatible versions (`==` vs `~=`) trade-off