File size: 1,575 Bytes
6bef416
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Limitations

This repository is designed as a reproducible RAG evaluation command center over bundled synthetic/offline artifacts.

## Scope boundaries

- No live LLM or embedding calls.
- No production vector database.
- No online document ingestion pipeline.
- No authentication, RBAC, multi-tenant controls, or API-key lifecycle.
- No scheduled monitoring jobs or alert delivery.
- No persistent user state beyond Streamlit session state.
- No production incident automation.

## Data boundaries

- The bundled tables are synthetic/offline evaluation artifacts.
- Metrics should be interpreted as evaluation diagnostics, not production SLOs.
- Policy simulation uses offline evidence-strength signals and should not be treated as a deployable gate without fresh validation.
- Evidence-strength scores are retrieval-side diagnostics, not calibrated model confidence.

## Implementation boundaries

- Filtering functions favor defensive copies over in-place mutation. This keeps Streamlit reruns predictable for the bundled dataset size, but very large evaluation tables may require a more memory-conscious filtering strategy.
- Risk and configuration scores use documented deterministic review weights, not learned coefficients. Recalibrate them before applying the dashboard to a real production corpus.

## Why these boundaries are intentional

The goal of v1.0.0 is to be deterministic, inspectable, testable, and easy to run from a clean clone. Live adapters and production integrations would add secrets, network variance, cost, and non-deterministic test behavior.