Testing Strategy
The test suite is designed to protect the repo from regressions that would weaken its value as a reproducible RAG evaluation artifact.
Covered areas
- CSV bundle loading and schema checks.
- Primary key presence and uniqueness for core tables.
- Required foreign-key presence and referential integrity across examples, retrieval events, chunks, documents, and scenarios.
- Strict numeric validation and standardization for required and optional numeric fields, including rejection of non-numeric corruption and missing required numeric values.
- Metric and policy output contracts.
- Numeric regression checks for risk scoring, retrieval outcome classification, evidence-strength proxy normalization and review-weight normalization, and policy monotonicity.
- Config leaderboard and risk-slice behavior.
- Project hygiene checks for docs, Docker, CI, Streamlit smoke coverage, Docker health-smoke coverage, Trace Explorer literal search behavior, and view separation.
Local checks
make check
This runs:
ruff check app.py src tests scripts
python -m compileall app.py src tests scripts
python scripts/run_pytest.py -q
python -c "import app; from src.dashboard import CommandCenterApp; CommandCenterApp()"
CI checks
GitHub Actions runs lint, compile, tests, and a Streamlit import smoke check on Python 3.11 and 3.12. A separate job builds the Docker image through scripts/docker_smoke.py, starts the container, verifies the container remains running, and probes Streamlit's /_stcore/health endpoint after the matrix passes.
Runtime-binding regression checks
The test suite includes AST-level checks for class methods defined without self and without @staticmethod. This prevents valid Python syntax from passing CI while failing during Streamlit render paths because of implicit instance binding.
In environments where Streamlit is installed, smoke tests also verify selected controller/view helper methods through an instantiated CommandCenterApp.