# Limitations This repository is designed as a reproducible RAG evaluation command center over bundled synthetic/offline artifacts. ## Scope boundaries - No live LLM or embedding calls. - No production vector database. - No online document ingestion pipeline. - No authentication, RBAC, multi-tenant controls, or API-key lifecycle. - No scheduled monitoring jobs or alert delivery. - No persistent user state beyond Streamlit session state. - No production incident automation. ## Data boundaries - The bundled tables are synthetic/offline evaluation artifacts. - Metrics should be interpreted as evaluation diagnostics, not production SLOs. - Policy simulation uses offline evidence-strength signals and should not be treated as a deployable gate without fresh validation. - Evidence-strength scores are retrieval-side diagnostics, not calibrated model confidence. ## Implementation boundaries - Filtering functions favor defensive copies over in-place mutation. This keeps Streamlit reruns predictable for the bundled dataset size, but very large evaluation tables may require a more memory-conscious filtering strategy. - Risk and configuration scores use documented deterministic review weights, not learned coefficients. Recalibrate them before applying the dashboard to a real production corpus. ## Why these boundaries are intentional The goal of v1.0.0 is to be deterministic, inspectable, testable, and easy to run from a clean clone. Live adapters and production integrations would add secrets, network variance, cost, and non-deterministic test behavior.