Spaces:
Running
feat: K8s corpus config entry, ingestion target, curation policy
Browse filesconfigs/default.yaml gains a corpora block with FastAPI and
Kubernetes entries plus default_corpus=fastapi. FastAPI keeps the
existing 0.02 refusal_threshold (matches legacy rag.refusal_threshold
exactly, so production behavior is unchanged). K8s ships a placeholder
0.30 pending the K8s golden dataset sweep β K8s has more cross-
referenced concepts than FastAPI, so relevance spreads across more
chunks per query and the threshold likely lands higher.
security.output.secret_check: true is made explicit in the YAML so
reviewers can see the Task-11 adversarial-review fix is enabled.
New Makefile target 'ingest-k8s' wraps scripts/ingest.py with
--doc-dir and --store-path flags, targeting data/k8s_docs and
.cache/store_k8s. scripts/ingest.py already supports those flags β
no script change needed.
data/k8s_docs/ is created with a .gitkeep and a SOURCES.md that
documents the curation policy: ~30-40 pages chosen around recruiter-
likely concepts (Pod, Deployment, Service, Ingress, ConfigMap, RBAC)
plus cross-referencing overview pages that stress the reranker.
Tutorials, cluster admin deep-dives, and kubectl reference are
explicitly out of scope. Each ingested page will have URL + date
pulled + one-line rationale.
app.py: the Task-2 rag.refusal_threshold warning is tightened to
fire only on genuine drift (legacy value non-default AND not equal
to the default corpus's threshold). The default.yaml in this commit
has both at 0.02, so the warning is silent β as intended.
- Makefile +4 -1
- agent_bench/serving/app.py +14 -8
- configs/default.yaml +29 -0
- data/k8s_docs/.gitkeep +0 -0
- data/k8s_docs/SOURCES.md +62 -0
|
@@ -1,6 +1,6 @@
|
|
| 1 |
PYTHON ?= /usr/local/opt/python@3.11/bin/python3.11
|
| 2 |
|
| 3 |
-
.PHONY: install test lint serve ingest evaluate-fast evaluate-full benchmark evaluate-langchain docker modal-deploy modal-stop vllm-up benchmark-all k8s-dev k8s-prod tf-plan tf-validate
|
| 4 |
|
| 5 |
install:
|
| 6 |
$(PYTHON) -m pip install -e ".[dev]"
|
|
@@ -19,6 +19,9 @@ serve:
|
|
| 19 |
ingest:
|
| 20 |
$(PYTHON) scripts/ingest.py --config configs/tasks/tech_docs.yaml
|
| 21 |
|
|
|
|
|
|
|
|
|
|
| 22 |
evaluate-fast:
|
| 23 |
$(PYTHON) scripts/evaluate.py --config configs/default.yaml --mode deterministic
|
| 24 |
|
|
|
|
| 1 |
PYTHON ?= /usr/local/opt/python@3.11/bin/python3.11
|
| 2 |
|
| 3 |
+
.PHONY: install test lint serve ingest ingest-k8s evaluate-fast evaluate-full benchmark evaluate-langchain docker modal-deploy modal-stop vllm-up benchmark-all k8s-dev k8s-prod tf-plan tf-validate
|
| 4 |
|
| 5 |
install:
|
| 6 |
$(PYTHON) -m pip install -e ".[dev]"
|
|
|
|
| 19 |
ingest:
|
| 20 |
$(PYTHON) scripts/ingest.py --config configs/tasks/tech_docs.yaml
|
| 21 |
|
| 22 |
+
ingest-k8s: ## Ingest Kubernetes docs into .cache/store_k8s
|
| 23 |
+
$(PYTHON) scripts/ingest.py --doc-dir data/k8s_docs --store-path .cache/store_k8s
|
| 24 |
+
|
| 25 |
evaluate-fast:
|
| 26 |
$(PYTHON) scripts/evaluate.py --config configs/default.yaml --mode deterministic
|
| 27 |
|
|
@@ -175,15 +175,21 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
|
|
| 175 |
providers=list(providers.keys()),
|
| 176 |
)
|
| 177 |
|
| 178 |
-
#
|
| 179 |
-
# per-corpus refusal_threshold is authoritative.
|
| 180 |
-
#
|
| 181 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
log.warning(
|
| 183 |
-
"
|
| 184 |
-
legacy_value=
|
| 185 |
-
|
| 186 |
-
|
|
|
|
|
|
|
| 187 |
)
|
| 188 |
|
| 189 |
# AppConfig._validate_default_corpus guarantees default_corpus is in
|
|
|
|
| 175 |
providers=list(providers.keys()),
|
| 176 |
)
|
| 177 |
|
| 178 |
+
# Legacy rag.refusal_threshold is ignored in multi-corpus mode;
|
| 179 |
+
# per-corpus refusal_threshold is authoritative. Only warn when the
|
| 180 |
+
# legacy value is non-default AND differs from the default corpus's
|
| 181 |
+
# threshold β that is the actual drift case. A legacy value that
|
| 182 |
+
# matches the default corpus is benign (someone kept both in sync).
|
| 183 |
+
legacy_thresh = config.rag.refusal_threshold
|
| 184 |
+
default_thresh = config.corpora[config.default_corpus].refusal_threshold
|
| 185 |
+
if legacy_thresh != 0.0 and legacy_thresh != default_thresh:
|
| 186 |
log.warning(
|
| 187 |
+
"rag_refusal_threshold_drift_in_multi_corpus_mode",
|
| 188 |
+
legacy_value=legacy_thresh,
|
| 189 |
+
default_corpus=config.default_corpus,
|
| 190 |
+
default_corpus_value=default_thresh,
|
| 191 |
+
hint="rag.refusal_threshold is ignored; "
|
| 192 |
+
"update corpora.<name>.refusal_threshold instead",
|
| 193 |
)
|
| 194 |
|
| 195 |
# AppConfig._validate_default_corpus guarantees default_corpus is in
|
|
@@ -74,9 +74,38 @@ security:
|
|
| 74 |
enabled: true
|
| 75 |
pii_check: true
|
| 76 |
url_check: true
|
|
|
|
| 77 |
blocklist: []
|
| 78 |
audit:
|
| 79 |
enabled: true
|
| 80 |
path: logs/audit.jsonl
|
| 81 |
max_size_mb: 100
|
| 82 |
rotate: true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
enabled: true
|
| 75 |
pii_check: true
|
| 76 |
url_check: true
|
| 77 |
+
secret_check: true
|
| 78 |
blocklist: []
|
| 79 |
audit:
|
| 80 |
enabled: true
|
| 81 |
path: logs/audit.jsonl
|
| 82 |
max_size_mb: 100
|
| 83 |
rotate: true
|
| 84 |
+
|
| 85 |
+
# --- Multi-corpus ---
|
| 86 |
+
# Per-corpus store paths, refusal thresholds, and iteration limits.
|
| 87 |
+
# Default_corpus must be a key in corpora (enforced by AppConfig validator).
|
| 88 |
+
#
|
| 89 |
+
# NOTE: rag.refusal_threshold above is ignored when corpora is non-empty.
|
| 90 |
+
# Each corpus declares its own refusal_threshold below; a startup warning
|
| 91 |
+
# fires if the legacy field is non-default to surface drift.
|
| 92 |
+
default_corpus: fastapi
|
| 93 |
+
|
| 94 |
+
corpora:
|
| 95 |
+
fastapi:
|
| 96 |
+
label: "FastAPI Docs"
|
| 97 |
+
store_path: .cache/store
|
| 98 |
+
data_path: data/tech_docs
|
| 99 |
+
refusal_threshold: 0.02 # matches legacy rag.refusal_threshold
|
| 100 |
+
top_k: 5
|
| 101 |
+
max_iterations: 3
|
| 102 |
+
k8s:
|
| 103 |
+
label: "Kubernetes"
|
| 104 |
+
store_path: .cache/store_k8s
|
| 105 |
+
data_path: data/k8s_docs
|
| 106 |
+
# PLACEHOLDER β tune against K8s golden dataset once it exists.
|
| 107 |
+
# K8s has more cross-referenced concepts than FastAPI, so relevance
|
| 108 |
+
# spreads across more chunks; the threshold likely lands higher.
|
| 109 |
+
refusal_threshold: 0.30
|
| 110 |
+
top_k: 5
|
| 111 |
+
max_iterations: 3
|
|
File without changes
|
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Kubernetes Corpus Sources
|
| 2 |
+
|
| 3 |
+
**Status:** Placeholder β curation scheduled as a separate work session
|
| 4 |
+
outside the multi-corpus refactor.
|
| 5 |
+
|
| 6 |
+
**Target:** ~30β40 markdown files from kubernetes.io/docs covering the
|
| 7 |
+
concepts a technical reviewer would naturally type into the demo β
|
| 8 |
+
not comprehensive K8s coverage.
|
| 9 |
+
|
| 10 |
+
## Scope
|
| 11 |
+
|
| 12 |
+
**Include:**
|
| 13 |
+
|
| 14 |
+
- Core workload concepts: Pod, Deployment, StatefulSet, DaemonSet, Job,
|
| 15 |
+
CronJob, ReplicaSet
|
| 16 |
+
- Networking: Service, Ingress, NetworkPolicy, EndpointSlice
|
| 17 |
+
- Config + state: ConfigMap, Secret, Volume, PersistentVolume, Namespace
|
| 18 |
+
- Access control: RBAC (Role, RoleBinding, ServiceAccount)
|
| 19 |
+
- Cross-referencing overview pages: "Connecting Applications with
|
| 20 |
+
Services", "Workload Resources", "Services, Load Balancing, and
|
| 21 |
+
Networking" β these stress the reranker because relevance spreads
|
| 22 |
+
across multiple chunks per query
|
| 23 |
+
|
| 24 |
+
**Exclude:**
|
| 25 |
+
|
| 26 |
+
- Cluster administration deep-dives (etcd, kubelet, kube-apiserver
|
| 27 |
+
internals) β wrong audience for a recruiter-facing demo
|
| 28 |
+
- Tutorials (long-form, chunk poorly, hurt retrieval precision)
|
| 29 |
+
- kubectl command reference and API reference β wrong shape for RAG,
|
| 30 |
+
better served by `--help`
|
| 31 |
+
- Release notes and version history β no lasting value for Q&A
|
| 32 |
+
|
| 33 |
+
## Curation policy
|
| 34 |
+
|
| 35 |
+
This corpus targets **recruiter-likely questions**, not coverage. A
|
| 36 |
+
question about etcd raft internals will be correctly refused β the
|
| 37 |
+
refusal mechanism is part of the demo story, not a failure mode.
|
| 38 |
+
|
| 39 |
+
Each ingested file below must have:
|
| 40 |
+
|
| 41 |
+
- A URL (source of truth, for re-scraping if content drifts)
|
| 42 |
+
- A date pulled (provenance, for audit)
|
| 43 |
+
- A one-line rationale (why this page is in scope)
|
| 44 |
+
|
| 45 |
+
| URL | Date pulled | Rationale |
|
| 46 |
+
|-----|------------|-----------|
|
| 47 |
+
| _TBD_ | _TBD_ | _TBD_ |
|
| 48 |
+
|
| 49 |
+
See `docs/plans/2026-04-12-multi-corpus-refactor-design.md` section
|
| 50 |
+
"Corpus Curation β Kubernetes" for the full policy.
|
| 51 |
+
|
| 52 |
+
## Ingestion
|
| 53 |
+
|
| 54 |
+
Once curated files are in place, run:
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
make ingest-k8s
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
This populates `.cache/store_k8s/` with embeddings + BM25 index matching
|
| 61 |
+
the FastAPI corpus's chunker settings (recursive, 512-token chunks,
|
| 62 |
+
64-token overlap).
|