Nomearod Claude Opus 4.6 (1M context) commited on
Commit
3241b7c
·
1 Parent(s): 23de799

docs(k8s): Week 1 step 2 — lock SOURCES.md categories + author QUESTION_PLAN.md

Browse files

Week 1 step 2 of the v1.1 plan: lock the K8s corpus scope and author
the structural guide for step 5's 25-question golden-set authoring.
Scope deliberately narrower than "commit to 30-40 verified URLs in
one session": per cross-cutting #8 pilot-first discipline, per-URL
resolution and per-page license verification are deferred to step 4
ingestion. A category-level lock plus an explicit step-4 checklist is
the 1-hour scope the plan's step 2 budget anticipates.

SOURCES.md changes:
- Status flipped from "Placeholder" to "Locked at category level".
- 28-page category breakdown table (9 core workloads, 5 networking,
5 config+state, 4 scheduling, 1 access, 2 health/autoscaling,
2 security). 25 questions at ~1/page with 3 pages of headroom for
multi-hop fan-out.
- 8 already-pulled pages documented with best-known URLs + pilot
evidence (k8s_network_policies.md is called out as the pilot_005
flavor-B target so step 4 does not re-ingest it under a new file
name).
- 20 remaining pages listed per category with a step-4 verification
checklist (URL resolution, license confirmation, pull-date record,
rationale re-check against QUESTION_PLAN.md).
- Content license documented: CC BY 4.0 default with per-page
verification discipline (same pattern as the v1.1 plan's
Lynx/HaluBench CC BY-NC handling).
- Post-ingest smoke-query gate added before step 5 authoring.

QUESTION_PLAN.md new file (261 lines):
- Target CRAG distribution (5–6 simple, 3–4 simple_w_condition,
3–4 comparison, 5–6 multi_hop, 3–4 false_premise, 0–3 set/agg/pph).
- Per-type source-page mapping — each CRAG type points to specific
pages from SOURCES.md that support questions of that type. The
mapping is the authoring guide step 5 consults when drafting
specific question texts.
- false_premise split: at least 1 flavor A (pure refusal) + at
least 1 flavor B (documented negative) with pilot_005 called out
as the existing flavor-B reference and three candidate flavor-B
pages listed for expansion (Pod Security Standards, RBAC, more
NetworkPolicy clauses).
- time_sensitive flag placement: 2–3 questions distributed across
≥2 CRAG types, each tied to a specific K8s version state
(HPA v1 vs v2, PSA stable at 1.25, PSP removal at 1.25).
- Difficulty distribution guidance (8–10 easy, 10–12 medium, 4–6
hard).
- Authoring checklist per question — 14 required schema fields with
explicit notes on which are flavor-A-specific, which match the
v1.1 plan's source-attribution methodology, and which may be
retired (is_multi_hop → question_type migration contingent on
harness.py update).
- Pilot-first validation gates BEFORE the 25-question authoring
session: (1) step 4 ingestion verified via smoke queries;
(2) existing 6-question pilot must still pass its gates against
the expanded corpus; (3) 2–3 hand-drafted questions tested
through the pipeline before bulk authoring. Each gate honors the
cross-cutting #8 discipline that caught six issues across four
sessions with zero false positives.

What this commit does NOT contain:
- Specific 25-question texts (step 5 authoring, fresh session).
- Verified kubernetes.io URLs for the 20 remaining pages (step 4).
- Pulled markdown content for the 20 remaining pages (step 4).
- Updates to agent_bench/evaluation/datasets/k8s_golden_pilot.json
(the 6-question pilot stays as-is until step 5 replaces it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

data/k8s_docs/QUESTION_PLAN.md ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # K8s Golden Dataset — Question Plan
2
+
3
+ **Status:** Structural guide for Week 1 step 5 authoring (v1.1 plan).
4
+ This document defines the 25-question target distribution, per-type
5
+ source-page mapping, and authoring constraints. It does NOT contain
6
+ the 25 specific question texts — those are authored during step 5 in
7
+ a fresh session, per cross-cutting #8 pilot-first discipline.
8
+
9
+ **Upstream contracts:**
10
+ - Taxonomy: CRAG 8-type (Yang et al., NeurIPS 2024) — see DECISIONS.md
11
+ "K8s golden dataset uses CRAG's 8-type taxonomy as the schema".
12
+ - Source pages: see `SOURCES.md` (28 pages, category-locked; 8 already
13
+ pulled, 20 to pull at step 4).
14
+ - Schema: see `agent_bench/evaluation/harness.py` `GoldenQuestion`
15
+ plus the v1.1 plan's methodology #3 source-attribution fields.
16
+ - Flavor A/B for `false_premise`: see DECISIONS.md "False-premise
17
+ questions come in two flavors".
18
+
19
+ ---
20
+
21
+ ## Target distribution (25 questions total)
22
+
23
+ | CRAG type | Count | Schema field | Notes |
24
+ |---|---|---|---|
25
+ | `simple` | 5–6 | `question_type: "simple"` | Baseline retrieval: direct lookup in 1 page, 1–2 sentence answer. |
26
+ | `simple_w_condition` | 3–4 | `question_type: "simple_w_condition"` | Answer depends on a condition stated in the question (enforcement level, volume type, Pod phase). |
27
+ | `comparison` | 3–4 | `question_type: "comparison"` | Answer compares two concepts across 2 pages; reranker stress. |
28
+ | `multi_hop` | 5–6 | `question_type: "multi_hop"` | Answer synthesizes 2–4 pages; reranker-stressing by construction. |
29
+ | `false_premise` | 3–4 | `question_type: "false_premise"` | Grounded refusal stress. Flavor A (pure refusal) + flavor B (documented negative). |
30
+ | `set` / `aggregation` / `post_processing_heavy` | 0–3 | respective values | Optional. Include only if natural from corpus content. |
31
+ | **Total** | **25** | | |
32
+
33
+ **Orthogonal flag:** `time_sensitive: bool` on 2–3 questions. Does
34
+ NOT replace `question_type` — it's an independent property for
35
+ version-bounded content (feature state, API version migration,
36
+ deprecations).
37
+
38
+ ---
39
+
40
+ ## Per-type source-page mapping
41
+
42
+ Each row identifies the K8s concept pages a question of that type
43
+ should draw from. Multi-hop and comparison questions list multiple
44
+ pages intentionally.
45
+
46
+ ### simple (5–6 slots)
47
+
48
+ Pool questions where a 1–2 sentence answer lives inside a single page.
49
+
50
+ | Candidate source | CRAG slot justification |
51
+ |---|---|
52
+ | `k8s_pods.md` | Pod IP semantics, container sharing, ephemeral containers |
53
+ | `k8s_deployment.md` | What a Deployment is, declarative update mechanic |
54
+ | `k8s_configmap.md` | What a ConfigMap is, immutable field |
55
+ | `k8s_secret.md` | What a Secret is, volume mount modes |
56
+ | RBAC Authorization *(step 4 page)* | RBAC primitive definitions (Role, RoleBinding, ClusterRole) |
57
+ | StatefulSet *(step 4 page)* | StatefulSet identity guarantees |
58
+ | DaemonSet *(step 4 page)* | One-per-node scheduling contract |
59
+ | Namespaces *(step 4 page)* | Namespace scoping for resources |
60
+
61
+ **Authoring rule:** Each `simple` question must have exactly one
62
+ expected source page and 1–2 source snippets. KHR target ≥ 0.60 on
63
+ the authored keywords.
64
+
65
+ ### simple_w_condition (3–4 slots)
66
+
67
+ Pool questions where the answer explicitly depends on a condition
68
+ named in the question.
69
+
70
+ | Candidate source | Condition that shapes the answer |
71
+ |---|---|
72
+ | `k8s_pod_security_admission.md` | enforcement level: `enforce` / `audit` / `warn` |
73
+ | `k8s_secret.md` | mount mode: environment variable vs file in volume |
74
+ | Liveness/Readiness/Startup Probes *(step 4)* | probe type: liveness vs readiness vs startup |
75
+ | Volumes *(step 4)* | volume type: emptyDir vs configMap vs persistentVolumeClaim |
76
+ | Node-pressure Eviction (`k8s_node_pressure_eviction.md`) | resource under pressure: memory vs disk vs inodes |
77
+
78
+ **Authoring rule:** The condition must be named in the question
79
+ stem, not implied. The expected answer must change materially if the
80
+ condition flips. Example: "How is a Secret mounted as a volume
81
+ versus consumed as an environment variable?" is a valid
82
+ `simple_w_condition`; "How is a Secret mounted?" is `simple`.
83
+
84
+ ### comparison (3–4 slots)
85
+
86
+ Pool questions where the answer explicitly compares two K8s concepts
87
+ that span 2 pages.
88
+
89
+ | Page pair | Concept compared |
90
+ |---|---|
91
+ | Deployment vs StatefulSet *(step 4)* | stateless vs stateful workload semantics |
92
+ | Deployment vs DaemonSet *(step 4)* | replica-count vs one-per-node scheduling |
93
+ | ConfigMap vs Secret | non-confidential vs confidential data, mount parity |
94
+ | Service vs Ingress *(step 4)* | L4 vs L7 exposure |
95
+ | Taints/Tolerations vs Node Affinity *(step 4)* | opt-out vs opt-in placement |
96
+ | Liveness vs Readiness probes *(step 4)* | restart vs traffic-routing semantics |
97
+
98
+ **Authoring rule:** The question must force retrieval from both
99
+ pages. Reranker stress is intentional — questions where BM25 would
100
+ find one side but miss the other are the target. Expected sources:
101
+ 2 pages minimum.
102
+
103
+ ### multi_hop (5–6 slots)
104
+
105
+ Pool questions where the answer synthesizes 2–4 pages. These are
106
+ the primary reranker stressors.
107
+
108
+ | Page set (example) | Hop path |
109
+ |---|---|
110
+ | Pod + Service + Ingress *(step 4)* | How external traffic reaches a Pod through Service → Ingress |
111
+ | Deployment + ReplicaSet + Pod | How a Deployment rollout changes the underlying ReplicaSet and Pod set |
112
+ | ConfigMap + Deployment | How a ConfigMap update propagates to Pods via env vars or mounted volume |
113
+ | HPA + Deployment + Metrics Server *(partial step 4)* | How HPA reads metrics and scales a Deployment |
114
+ | NetworkPolicy + Pod + Namespace *(partial step 4)* | How NetworkPolicy selectors resolve across namespaces |
115
+ | Job + Pod + Container lifecycle *(partial step 4)* | How a Job's completions and parallelism interact with Pod restart policy |
116
+
117
+ **Authoring rule:** Expected sources ≥ 2 pages. The question must
118
+ not be answerable from any single page alone. `source_chunk_ids`
119
+ must list at least one chunk from each expected page; partial
120
+ credit is granted in the evaluator if at least one expected chunk is
121
+ cited (see `agent_bench/evaluation/harness.py`).
122
+
123
+ ### false_premise (3–4 slots)
124
+
125
+ Pool questions whose premise is wrong. Split across two flavors:
126
+
127
+ **Flavor A — pure refusal** (at least 1 slot):
128
+ - Premise targets a capability that does not exist in the K8s corpus
129
+ (not in any pulled page).
130
+ - Example seed: "How do I configure Claude API rate limits in a
131
+ Kubernetes Deployment?" (wrong domain — Claude API is not a K8s
132
+ concept)
133
+ - Schema: `category: "out_of_scope"`, `expected_sources: []`,
134
+ `source_snippets: []`.
135
+ - Evaluator expectation: answer contains refusal phrasing AND cites
136
+ zero sources.
137
+
138
+ **Flavor B — documented negative** (at least 1 slot, ideally 2):
139
+ - Corpus contains an explicit negative statement (e.g.
140
+ NetworkPolicy "Anything TLS related" limitation at chunk 63 of
141
+ `k8s_network_policies.md`).
142
+ - Example already in pilot: `k8s_pilot_005` (NetworkPolicy mTLS).
143
+ - Schema: `category: "retrieval"`, `question_type: "false_premise"`,
144
+ `expected_sources: [<negative-answer page>]`,
145
+ `source_snippets: [<verbatim negative statement>]`.
146
+ - Evaluator expectation: answer reports the documented negative
147
+ with citation, does NOT open with "the documentation does not
148
+ provide instructions" phrasing (per pilot_005 Fix 1 + Fix 2
149
+ revert analysis).
150
+
151
+ **Other flavor-B candidate pages for authoring:**
152
+ - Pod Security Standards — explicit statements about what each
153
+ profile does NOT permit
154
+ - RBAC Authorization — explicit statements about what RBAC does NOT
155
+ provide (e.g. no deny rules)
156
+ - NetworkPolicy — additional negative clauses beyond the pilot_005
157
+ mTLS one
158
+
159
+ ### set / aggregation / post_processing_heavy (0–3 slots)
160
+
161
+ Include only if a K8s page naturally supports the pattern:
162
+
163
+ - `set`: "Which Kubernetes resources can expose a Service?" (answer
164
+ is a set drawn from the Service page). Include 0–1 of this type
165
+ if a clean example emerges; otherwise leave slot empty.
166
+ - `aggregation`: Unlikely to fit K8s docs (docs describe concepts,
167
+ not tabular data). Likely leave empty.
168
+ - `post_processing_heavy`: Unlikely to fit K8s docs. Likely leave
169
+ empty.
170
+
171
+ **Default:** Leave 0–3 as **0**. Only author these if a question
172
+ emerges organically during step 5. Do not force-author to hit a
173
+ target count; the plan explicitly says "0–3, included only where
174
+ corpus content naturally supports".
175
+
176
+ ---
177
+
178
+ ## `time_sensitive` flag placement (2–3 questions)
179
+
180
+ Flag questions whose correct answer depends on K8s version state:
181
+
182
+ | Candidate | Why time-sensitive |
183
+ |---|---|
184
+ | HPA API version | `autoscaling/v1` vs `autoscaling/v2` — v2 stable since 1.23 |
185
+ | Pod Security Admission stability | "stable as of v1.25" — feature state in the page |
186
+ | PodSecurityPolicy removal | PSP removed in 1.25; migration path to PSA |
187
+
188
+ **Authoring rule:** Set `time_sensitive: true` on exactly 2–3
189
+ questions. Distribute across ≥2 different CRAG types (e.g. one
190
+ `simple`, one `simple_w_condition`) so the flag is not concentrated
191
+ in a single type. Each `time_sensitive` question must cite a
192
+ specific K8s version or feature state in the source snippet,
193
+ otherwise the flag is not load-bearing.
194
+
195
+ ---
196
+
197
+ ## Difficulty distribution
198
+
199
+ Loose guidance, not a hard constraint:
200
+
201
+ - `easy`: 8–10 questions — mostly `simple` and single-page
202
+ `simple_w_condition`
203
+ - `medium`: 10–12 questions — `comparison`, most `multi_hop`,
204
+ straightforward `false_premise`
205
+ - `hard`: 4–6 questions — deep `multi_hop`, flavor-B `false_premise`,
206
+ `time_sensitive` + `multi_hop` combinations
207
+
208
+ The pilot's 6-question set is all `easy`/`medium`. Step 5 should add
209
+ the `hard` tier.
210
+
211
+ ---
212
+
213
+ ## Authoring checklist (per question)
214
+
215
+ For each of the 25 questions, the step 5 author must fill:
216
+
217
+ | Field | Required | Notes |
218
+ |---|---|---|
219
+ | `id` | yes | `k8s_<NNN>` zero-padded (e.g. `k8s_001`) |
220
+ | `question` | yes | Natural-language question in the voice of a recruiter or developer |
221
+ | `expected_answer_keywords` | yes | 3–6 keywords that MUST appear in a correct answer; drives `keyword_hit_rate` |
222
+ | `expected_sources` | yes | List of `.md` filenames from `SOURCES.md`; ≥1 for scoped questions, `[]` for flavor-A false-premise |
223
+ | `category` | yes | `retrieval` / `calculation` / `out_of_scope` |
224
+ | `difficulty` | yes | `easy` / `medium` / `hard` |
225
+ | `requires_calculator` | yes | `false` for all K8s questions (no calc tool use expected) |
226
+ | `reference_answer` | yes | 1–3 sentence answer used by the optional LLM judge |
227
+ | `question_type` | yes | CRAG taxonomy value (exactly one of the 8 canonical strings) |
228
+ | `time_sensitive` | yes | `bool`; `true` on exactly 2–3 questions |
229
+ | `source_chunk_ids` | yes | Content-hashed chunk IDs (stable across reindex); must be `[]` for flavor-A false-premise |
230
+ | `source_snippets` | yes | ~20 words verbatim per chunk; drift-detection field |
231
+ | `source_pages` | yes | Human-readable page anchor (e.g. `"concepts/workloads/pods"`) |
232
+ | `source_sections` | yes | Deepest heading containing the snippet |
233
+
234
+ **Deprecation note:** The pilot schema has `is_multi_hop: bool`.
235
+ Step 5 may retire this field in favor of `question_type == "multi_hop"`,
236
+ but only after confirming the evaluator's partial-credit logic
237
+ (`agent_bench/evaluation/harness.py:38`) is updated to read from
238
+ `question_type`. Do NOT remove `is_multi_hop` without the
239
+ corresponding harness update, or existing pilot questions will
240
+ break partial-credit scoring.
241
+
242
+ ---
243
+
244
+ ## Pilot-first validation before step 5 authoring
245
+
246
+ Before writing the 25 questions, step 5 author must:
247
+
248
+ 1. Confirm the 20 new pages from step 4 are ingested and reachable
249
+ via the pipeline (smoke-query test per `SOURCES.md`'s post-ingest
250
+ validation).
251
+ 2. Re-run `make evaluate` on the existing 6-question pilot dataset
252
+ against the newly-expanded corpus. The pilot's existing questions
253
+ must still pass their per-question gates — if adding 20 new
254
+ pages drops pilot P@5 materially, investigate before adding more
255
+ questions on top.
256
+ 3. Hand-draft 2–3 questions first, run them through the pipeline,
257
+ and confirm retrieval surfaces the expected chunks. This is the
258
+ final pilot-first checkpoint before bulk authoring.
259
+
260
+ Only after these three checks pass does the step 5 author proceed
261
+ to the full 25-question authoring session.
data/k8s_docs/SOURCES.md CHANGED
@@ -1,25 +1,38 @@
1
  # Kubernetes Corpus Sources
2
 
3
- **Status:** Placeholder curation scheduled as a separate work session
4
- outside the multi-corpus refactor.
5
-
6
- **Target:** ~30–40 markdown files from kubernetes.io/docs covering the
7
- concepts a technical reviewer would naturally type into the demo —
8
- not comprehensive K8s coverage.
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ## Scope
11
 
12
  **Include:**
13
 
14
- - Core workload concepts: Pod, Deployment, StatefulSet, DaemonSet, Job,
15
- CronJob, ReplicaSet
16
- - Networking: Service, Ingress, NetworkPolicy, EndpointSlice
17
- - Config + state: ConfigMap, Secret, Volume, PersistentVolume, Namespace
18
- - Access control: RBAC (Role, RoleBinding, ServiceAccount)
19
- - Cross-referencing overview pages: "Connecting Applications with
20
- Services", "Workload Resources", "Services, Load Balancing, and
21
- Networking" these stress the reranker because relevance spreads
22
- across multiple chunks per query
 
 
23
 
24
  **Exclude:**
25
 
@@ -36,27 +49,114 @@ This corpus targets **recruiter-likely questions**, not coverage. A
36
  question about etcd raft internals will be correctly refused — the
37
  refusal mechanism is part of the demo story, not a failure mode.
38
 
39
- Each ingested file below must have:
40
 
41
- - A URL (source of truth, for re-scraping if content drifts)
42
- - A date pulled (provenance, for audit)
 
43
  - A one-line rationale (why this page is in scope)
44
-
45
- | URL | Date pulled | Rationale |
46
- |-----|------------|-----------|
47
- | _TBD_ | _TBD_ | _TBD_ |
48
-
49
- See `docs/plans/2026-04-12-multi-corpus-refactor-design.md` section
50
- "Corpus Curation — Kubernetes" for the full policy.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## Ingestion
53
 
54
- Once curated files are in place, run:
55
 
56
  ```bash
57
  make ingest-k8s
58
  ```
59
 
60
- This populates `.cache/store_k8s/` with embeddings + BM25 index matching
61
- the FastAPI corpus's chunker settings (recursive, 512-token chunks,
62
- 64-token overlap).
 
 
 
 
 
 
 
 
 
 
1
  # Kubernetes Corpus Sources
2
 
3
+ **Status:** Locked at the category level (v1.1 Week 1 step 2). Per-page
4
+ URL verification and pull dates are deferred to step 4 ingestion per
5
+ pilot-first discipline — committing to 25 specific kubernetes.io URLs
6
+ in this session without a verification pass would invert the
7
+ "draft small, validate, then bulk" rule documented in the plan's
8
+ cross-cutting #8.
9
+
10
+ **Target:** ~25–30 markdown files from kubernetes.io/docs — enough to
11
+ support 25 golden questions at ~1 question per page with headroom for
12
+ multi-hop questions that draw on 2–4 pages each.
13
+
14
+ **Content license:** All kubernetes.io/docs content is licensed under
15
+ [CC BY 4.0](https://git.k8s.io/website/LICENSE). License verification
16
+ happens per page at step 4 pull time; any page whose license terms
17
+ differ from the site default is flagged in the table below and
18
+ reassessed against the honest-evaluation brand's licensing discipline
19
+ (same pattern the v1.1 plan uses for Lynx/HaluBench CC BY-NC).
20
 
21
  ## Scope
22
 
23
  **Include:**
24
 
25
+ - Core workload concepts: Pod, Deployment, StatefulSet, DaemonSet,
26
+ Job, CronJob, ReplicaSet, Init Containers, Pod Lifecycle
27
+ - Networking: Service, Ingress, NetworkPolicy, EndpointSlice, DNS
28
+ - Config + state: ConfigMap, Secret, Volumes, PersistentVolumes,
29
+ Namespaces
30
+ - Scheduling + resources: Resource Management, Node Assignment,
31
+ Taints and Tolerations, Node-pressure Eviction
32
+ - Access control: RBAC Authorization
33
+ - Health + autoscaling: Liveness/Readiness/Startup Probes,
34
+ Horizontal Pod Autoscaling
35
+ - Security: Pod Security Admission, Pod Security Standards
36
 
37
  **Exclude:**
38
 
 
49
  question about etcd raft internals will be correctly refused — the
50
  refusal mechanism is part of the demo story, not a failure mode.
51
 
52
+ Each ingested page below must have:
53
 
54
+ - A canonical kubernetes.io/docs URL (source of truth, for re-scraping
55
+ if content drifts)
56
+ - A date pulled (provenance, for audit; verified at step 4)
57
  - A one-line rationale (why this page is in scope)
58
+ - License confirmation (default CC BY 4.0 unless a per-page notice says
59
+ otherwise)
60
+
61
+ ## Locked category breakdown
62
+
63
+ | Category | Target pages | Rationale |
64
+ |---|---|---|
65
+ | Core workloads | 9 | Pod, Pod Lifecycle, Deployment, ReplicaSet, StatefulSet, DaemonSet, Job, CronJob, Init Containers. The reranker-stressing multi-hop questions will draw on 2–4 of these per question. |
66
+ | Networking | 5 | Service, Ingress, NetworkPolicy, EndpointSlice, DNS for Services and Pods. NetworkPolicy is already validated as the pilot_005 flavor-B false_premise target. |
67
+ | Config + state | 5 | ConfigMap, Secret, Volumes, Persistent Volumes, Namespaces. Supports `simple_w_condition` questions where the answer depends on configuration context (volume type, secret mount mode, namespace scoping). |
68
+ | Scheduling + resources | 4 | Resource Management for Pods and Containers, Assigning Pods to Nodes, Taints and Tolerations, Node-pressure Eviction (already pulled). Good source for `comparison` questions (e.g. taints vs affinity) and `time_sensitive` questions (feature-state-bound scheduler behavior). |
69
+ | Access control | 1 | RBAC Authorization. Single page supports 1–2 `simple` questions about RBAC primitives. Not the reranker-stressing category. |
70
+ | Health + autoscaling | 2 | Liveness/Readiness/Startup Probes, Horizontal Pod Autoscaling. HPA is a `time_sensitive` candidate (autoscaling/v2 stable state). |
71
+ | Security | 2 | Pod Security Admission (already pulled), Pod Security Standards. Pod Security Admission is the `simple_w_condition` stressor where answer depends on enforcement level (enforce / audit / warn). |
72
+ | **Total** | **28** | Supports 25 questions with 3 pages of headroom for multi-hop fan-out. |
73
+
74
+ ## Already-pulled pages (8 from the pilot corpus)
75
+
76
+ These were pulled during the pilot work and are the empirical grounding
77
+ for the threshold calibration at 0.015 and the flavor-B discipline for
78
+ pilot_005. No re-pull required unless content drift is detected at
79
+ step 4 verification.
80
+
81
+ | File | Category | Best-known URL | Pilot evidence |
82
+ |---|---|---|---|
83
+ | `k8s_configmap.md` | Config + state | `https://kubernetes.io/docs/concepts/configuration/configmap/` | — |
84
+ | `k8s_deployment.md` | Core workloads | `https://kubernetes.io/docs/concepts/workloads/controllers/deployment/` | — |
85
+ | `k8s_network_policies.md` | Networking | `https://kubernetes.io/docs/concepts/services-networking/network-policies/` | **pilot_005 flavor-B target** — contains "Anything TLS related (use a service mesh or ingress controller for this)" at chunk_index 63 |
86
+ | `k8s_node_pressure_eviction.md` | Scheduling + resources | `https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/` | — |
87
+ | `k8s_pod_security_admission.md` | Security | `https://kubernetes.io/docs/concepts/security/pod-security-admission/` | — |
88
+ | `k8s_pods.md` | Core workloads | `https://kubernetes.io/docs/concepts/workloads/pods/` | pilot_001 target (Pod IP + localhost communication) |
89
+ | `k8s_replicaset.md` | Core workloads | `https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/` | — |
90
+ | `k8s_secret.md` | Config + state | `https://kubernetes.io/docs/concepts/configuration/secret/` | — |
91
+
92
+ ## Pages to pull at step 4 (20 remaining)
93
+
94
+ **Core workloads (6 to add):**
95
+ - Pod Lifecycle
96
+ - StatefulSet
97
+ - DaemonSet
98
+ - Job
99
+ - CronJob
100
+ - Init Containers
101
+
102
+ **Networking (4 to add):**
103
+ - Service
104
+ - Ingress
105
+ - EndpointSlice
106
+ - DNS for Services and Pods
107
+
108
+ **Config + state (3 to add):**
109
+ - Volumes
110
+ - Persistent Volumes
111
+ - Namespaces
112
+
113
+ **Scheduling + resources (3 to add):**
114
+ - Resource Management for Pods and Containers
115
+ - Assigning Pods to Nodes
116
+ - Taints and Tolerations
117
+
118
+ **Access control (1 to add):**
119
+ - RBAC Authorization
120
+
121
+ **Health + autoscaling (2 to add):**
122
+ - Configure Liveness, Readiness and Startup Probes
123
+ - Horizontal Pod Autoscaling
124
+
125
+ **Security (1 to add):**
126
+ - Pod Security Standards
127
+
128
+ **Step 4 checklist per page:**
129
+ 1. Resolve kubernetes.io/docs URL — use the best-known path in the
130
+ table above as a starting point; confirm the page loads at that
131
+ path; if redirected, update SOURCES.md with the final URL and
132
+ a one-line note explaining the redirect.
133
+ 2. Confirm CC BY 4.0 licensing (default); flag any exception.
134
+ 3. Pull content using the same scraper used for the pilot 8 pages
135
+ (matching format with inline markdown links and structured
136
+ headings).
137
+ 4. Record the pull date in the "date pulled" column.
138
+ 5. Verify the one-line rationale still holds after reading the
139
+ page — if the page content doesn't support any planned
140
+ question (see `QUESTION_PLAN.md`), flag for replacement with a
141
+ reasoned alternative.
142
 
143
  ## Ingestion
144
 
145
+ Once all 28 files are in `data/k8s_docs/`, run:
146
 
147
  ```bash
148
  make ingest-k8s
149
  ```
150
 
151
+ This populates `.cache/store_k8s/` with embeddings + BM25 index
152
+ matching the FastAPI corpus's chunker settings (recursive, 512-token
153
+ chunks, 64-token overlap).
154
+
155
+ **Post-ingest validation (pilot-first):** Before authoring the full
156
+ 25-question golden set, run 2–3 smoke queries against the ingested
157
+ store (e.g. `"what is a StatefulSet"`, `"how does HPA scale
158
+ replicas"`, `"what happens when a Pod is evicted"`) and confirm that
159
+ the retrieval returns sensible chunks from the expected pages. Any
160
+ query that surfaces irrelevant chunks or hits the refusal gate
161
+ indicates a chunk-boundary or content-coverage issue that should be
162
+ debugged before the golden-set authoring session.