Spaces:
Sleeping
Sleeping
| # Limitations | |
| This repository is designed as a reproducible RAG evaluation command center over bundled synthetic/offline artifacts. | |
| ## Scope boundaries | |
| - No live LLM or embedding calls. | |
| - No production vector database. | |
| - No online document ingestion pipeline. | |
| - No authentication, RBAC, multi-tenant controls, or API-key lifecycle. | |
| - No scheduled monitoring jobs or alert delivery. | |
| - No persistent user state beyond Streamlit session state. | |
| - No production incident automation. | |
| ## Data boundaries | |
| - The bundled tables are synthetic/offline evaluation artifacts. | |
| - Metrics should be interpreted as evaluation diagnostics, not production SLOs. | |
| - Policy simulation uses offline evidence-strength signals and should not be treated as a deployable gate without fresh validation. | |
| - Evidence-strength scores are retrieval-side diagnostics, not calibrated model confidence. | |
| ## Implementation boundaries | |
| - Filtering functions favor defensive copies over in-place mutation. This keeps Streamlit reruns predictable for the bundled dataset size, but very large evaluation tables may require a more memory-conscious filtering strategy. | |
| - Risk and configuration scores use documented deterministic review weights, not learned coefficients. Recalibrate them before applying the dashboard to a real production corpus. | |
| ## Why these boundaries are intentional | |
| The goal of v1.0.0 is to be deterministic, inspectable, testable, and easy to run from a clean clone. Live adapters and production integrations would add secrets, network variance, cost, and non-deterministic test behavior. | |