Nomearod commited on
Commit
3c0089e
Β·
1 Parent(s): 6e8d2ee

feat: K8s corpus config entry, ingestion target, curation policy

Browse files

configs/default.yaml gains a corpora block with FastAPI and
Kubernetes entries plus default_corpus=fastapi. FastAPI keeps the
existing 0.02 refusal_threshold (matches legacy rag.refusal_threshold
exactly, so production behavior is unchanged). K8s ships a placeholder
0.30 pending the K8s golden dataset sweep β€” K8s has more cross-
referenced concepts than FastAPI, so relevance spreads across more
chunks per query and the threshold likely lands higher.

security.output.secret_check: true is made explicit in the YAML so
reviewers can see the Task-11 adversarial-review fix is enabled.

New Makefile target 'ingest-k8s' wraps scripts/ingest.py with
--doc-dir and --store-path flags, targeting data/k8s_docs and
.cache/store_k8s. scripts/ingest.py already supports those flags β€”
no script change needed.

data/k8s_docs/ is created with a .gitkeep and a SOURCES.md that
documents the curation policy: ~30-40 pages chosen around recruiter-
likely concepts (Pod, Deployment, Service, Ingress, ConfigMap, RBAC)
plus cross-referencing overview pages that stress the reranker.
Tutorials, cluster admin deep-dives, and kubectl reference are
explicitly out of scope. Each ingested page will have URL + date
pulled + one-line rationale.

app.py: the Task-2 rag.refusal_threshold warning is tightened to
fire only on genuine drift (legacy value non-default AND not equal
to the default corpus's threshold). The default.yaml in this commit
has both at 0.02, so the warning is silent β€” as intended.

Makefile CHANGED
@@ -1,6 +1,6 @@
1
  PYTHON ?= /usr/local/opt/python@3.11/bin/python3.11
2
 
3
- .PHONY: install test lint serve ingest evaluate-fast evaluate-full benchmark evaluate-langchain docker modal-deploy modal-stop vllm-up benchmark-all k8s-dev k8s-prod tf-plan tf-validate
4
 
5
  install:
6
  $(PYTHON) -m pip install -e ".[dev]"
@@ -19,6 +19,9 @@ serve:
19
  ingest:
20
  $(PYTHON) scripts/ingest.py --config configs/tasks/tech_docs.yaml
21
 
 
 
 
22
  evaluate-fast:
23
  $(PYTHON) scripts/evaluate.py --config configs/default.yaml --mode deterministic
24
 
 
1
  PYTHON ?= /usr/local/opt/python@3.11/bin/python3.11
2
 
3
+ .PHONY: install test lint serve ingest ingest-k8s evaluate-fast evaluate-full benchmark evaluate-langchain docker modal-deploy modal-stop vllm-up benchmark-all k8s-dev k8s-prod tf-plan tf-validate
4
 
5
  install:
6
  $(PYTHON) -m pip install -e ".[dev]"
 
19
  ingest:
20
  $(PYTHON) scripts/ingest.py --config configs/tasks/tech_docs.yaml
21
 
22
+ ingest-k8s: ## Ingest Kubernetes docs into .cache/store_k8s
23
+ $(PYTHON) scripts/ingest.py --doc-dir data/k8s_docs --store-path .cache/store_k8s
24
+
25
  evaluate-fast:
26
  $(PYTHON) scripts/evaluate.py --config configs/default.yaml --mode deterministic
27
 
agent_bench/serving/app.py CHANGED
@@ -175,15 +175,21 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
175
  providers=list(providers.keys()),
176
  )
177
 
178
- # Fix #3: legacy rag.refusal_threshold is ignored in multi-corpus mode;
179
- # per-corpus refusal_threshold is authoritative. Warn loudly so config
180
- # drift surfaces at startup instead of becoming a silent divergence.
181
- if config.rag.refusal_threshold != 0.0:
 
 
 
 
182
  log.warning(
183
- "rag_refusal_threshold_ignored_in_multi_corpus_mode",
184
- legacy_value=config.rag.refusal_threshold,
185
- authoritative_source="corpora.<name>.refusal_threshold",
186
- hint="remove rag.refusal_threshold from config to silence",
 
 
187
  )
188
 
189
  # AppConfig._validate_default_corpus guarantees default_corpus is in
 
175
  providers=list(providers.keys()),
176
  )
177
 
178
+ # Legacy rag.refusal_threshold is ignored in multi-corpus mode;
179
+ # per-corpus refusal_threshold is authoritative. Only warn when the
180
+ # legacy value is non-default AND differs from the default corpus's
181
+ # threshold β€” that is the actual drift case. A legacy value that
182
+ # matches the default corpus is benign (someone kept both in sync).
183
+ legacy_thresh = config.rag.refusal_threshold
184
+ default_thresh = config.corpora[config.default_corpus].refusal_threshold
185
+ if legacy_thresh != 0.0 and legacy_thresh != default_thresh:
186
  log.warning(
187
+ "rag_refusal_threshold_drift_in_multi_corpus_mode",
188
+ legacy_value=legacy_thresh,
189
+ default_corpus=config.default_corpus,
190
+ default_corpus_value=default_thresh,
191
+ hint="rag.refusal_threshold is ignored; "
192
+ "update corpora.<name>.refusal_threshold instead",
193
  )
194
 
195
  # AppConfig._validate_default_corpus guarantees default_corpus is in
configs/default.yaml CHANGED
@@ -74,9 +74,38 @@ security:
74
  enabled: true
75
  pii_check: true
76
  url_check: true
 
77
  blocklist: []
78
  audit:
79
  enabled: true
80
  path: logs/audit.jsonl
81
  max_size_mb: 100
82
  rotate: true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  enabled: true
75
  pii_check: true
76
  url_check: true
77
+ secret_check: true
78
  blocklist: []
79
  audit:
80
  enabled: true
81
  path: logs/audit.jsonl
82
  max_size_mb: 100
83
  rotate: true
84
+
85
+ # --- Multi-corpus ---
86
+ # Per-corpus store paths, refusal thresholds, and iteration limits.
87
+ # Default_corpus must be a key in corpora (enforced by AppConfig validator).
88
+ #
89
+ # NOTE: rag.refusal_threshold above is ignored when corpora is non-empty.
90
+ # Each corpus declares its own refusal_threshold below; a startup warning
91
+ # fires if the legacy field is non-default to surface drift.
92
+ default_corpus: fastapi
93
+
94
+ corpora:
95
+ fastapi:
96
+ label: "FastAPI Docs"
97
+ store_path: .cache/store
98
+ data_path: data/tech_docs
99
+ refusal_threshold: 0.02 # matches legacy rag.refusal_threshold
100
+ top_k: 5
101
+ max_iterations: 3
102
+ k8s:
103
+ label: "Kubernetes"
104
+ store_path: .cache/store_k8s
105
+ data_path: data/k8s_docs
106
+ # PLACEHOLDER β€” tune against K8s golden dataset once it exists.
107
+ # K8s has more cross-referenced concepts than FastAPI, so relevance
108
+ # spreads across more chunks; the threshold likely lands higher.
109
+ refusal_threshold: 0.30
110
+ top_k: 5
111
+ max_iterations: 3
data/k8s_docs/.gitkeep ADDED
File without changes
data/k8s_docs/SOURCES.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Kubernetes Corpus Sources
2
+
3
+ **Status:** Placeholder β€” curation scheduled as a separate work session
4
+ outside the multi-corpus refactor.
5
+
6
+ **Target:** ~30–40 markdown files from kubernetes.io/docs covering the
7
+ concepts a technical reviewer would naturally type into the demo β€”
8
+ not comprehensive K8s coverage.
9
+
10
+ ## Scope
11
+
12
+ **Include:**
13
+
14
+ - Core workload concepts: Pod, Deployment, StatefulSet, DaemonSet, Job,
15
+ CronJob, ReplicaSet
16
+ - Networking: Service, Ingress, NetworkPolicy, EndpointSlice
17
+ - Config + state: ConfigMap, Secret, Volume, PersistentVolume, Namespace
18
+ - Access control: RBAC (Role, RoleBinding, ServiceAccount)
19
+ - Cross-referencing overview pages: "Connecting Applications with
20
+ Services", "Workload Resources", "Services, Load Balancing, and
21
+ Networking" β€” these stress the reranker because relevance spreads
22
+ across multiple chunks per query
23
+
24
+ **Exclude:**
25
+
26
+ - Cluster administration deep-dives (etcd, kubelet, kube-apiserver
27
+ internals) β€” wrong audience for a recruiter-facing demo
28
+ - Tutorials (long-form, chunk poorly, hurt retrieval precision)
29
+ - kubectl command reference and API reference β€” wrong shape for RAG,
30
+ better served by `--help`
31
+ - Release notes and version history β€” no lasting value for Q&A
32
+
33
+ ## Curation policy
34
+
35
+ This corpus targets **recruiter-likely questions**, not coverage. A
36
+ question about etcd raft internals will be correctly refused β€” the
37
+ refusal mechanism is part of the demo story, not a failure mode.
38
+
39
+ Each ingested file below must have:
40
+
41
+ - A URL (source of truth, for re-scraping if content drifts)
42
+ - A date pulled (provenance, for audit)
43
+ - A one-line rationale (why this page is in scope)
44
+
45
+ | URL | Date pulled | Rationale |
46
+ |-----|------------|-----------|
47
+ | _TBD_ | _TBD_ | _TBD_ |
48
+
49
+ See `docs/plans/2026-04-12-multi-corpus-refactor-design.md` section
50
+ "Corpus Curation β€” Kubernetes" for the full policy.
51
+
52
+ ## Ingestion
53
+
54
+ Once curated files are in place, run:
55
+
56
+ ```bash
57
+ make ingest-k8s
58
+ ```
59
+
60
+ This populates `.cache/store_k8s/` with embeddings + BM25 index matching
61
+ the FastAPI corpus's chunker settings (recursive, 512-token chunks,
62
+ 64-token overlap).