Generated 2026-06-13T04:20:49.182286+00:00. Data source: local GEPA artifacts and 12B external validation runs.
+
+
+
+
+
What This Shows
+
The optimizer artifacts contain one completed proposal step per GEPA run. The visible trajectory is therefore candidate 0, the seed/inherited prompt, to candidate 1, the reflected prompt produced by GEPA.
+
These are not label-count reward graphs. The score is the localpager classifier scoring function: it penalizes false positives, false negatives, and over-labeling. Extra random labels cannot improve the score unless they match the gold labels in ds4.jsonl, and false positives are weighted as the more costly error.
+
E4B smoke
0.250 -> 0.625
best candidate 1; delta 0.375
12B six selected
0.286 -> 0.556
best candidate 1; delta 0.270
12B twelve init
0.424 -> 0.424
best candidate 0; delta 0.000
12B twelve continuation
0.537 -> 0.610
best candidate 1; delta 0.073
+
+
+
Internal GEPA Validation Score By Candidate
+
This is the direct score-over-iteration view from each run's gepa-result.json. Candidate 0 is the baseline/inherited prompt; candidate 1 is the GEPA-reflected candidate when a proposal completed.
+
+
Numbers
+
Run
Candidate
Validation score
Best?
Discovery eval count
E4B smoke GEPA
0
0.2500
0
E4B smoke GEPA
1
0.6250
yes
4
12B six-row GEPA (selected)
0
0.2857
0
12B six-row GEPA (selected)
1
0.5556
yes
10
12B twelve-row init-only
0
0.4244
yes
0
12B twelve-row continuation GEPA
0
0.5375
0
12B twelve-row continuation GEPA
1
0.6101
yes
20
+
+
+
GEPA Subsample Before/After Scores
+
This uses run_log.json. It shows the score on the exact subsample GEPA reflected on, before and after the proposed prompt.
+
+
Numbers
+
Run
Iteration
Candidate transition
Subsample ids
Before mean
After mean
Delta
E4B smoke GEPA
0
0 -> 1
[0]
0.1429
1.0000
0.8571
12B six-row GEPA (selected)
0
0 -> 1
[4, 1]
0.2679
1.0000
0.7321
12B twelve-row continuation GEPA
0
0 -> 1
[1, 10, 9, 5]
0.6750
0.8750
0.2000
+
+
+
External Cumulative Validation Checkpoints
+
This is the scientific sanity check: the selected 12B-six candidate compared against the v9.1 seed as more of Shaun's 60-row set is accumulated. This graph is outside the GEPA training loop.
+
+
Numbers
+
Rows added
Checkpoint rows
Seed segment
GEPA segment
Segment delta
Seed cumulative
GEPA cumulative
Cumulative delta
1-6
6
0.5214
0.5833
0.0619
0.5214
0.5833
0.0619
7-12
12
0.4833
0.4917
0.0083
0.5024
0.5375
0.0351
13-18
18
0.5000
0.5833
0.0833
0.5016
0.5528
0.0512
19-30
30
0.3946
0.4552
0.0605
0.4588
0.5137
0.0549
31-60
60
0.4056
0.4687
0.0630
0.4322
0.4912
0.0590
+
+
+
Per-Validation-Row Candidate Heatmaps
+
These show which validation rows changed inside the GEPA run, not just the aggregate mean.
The selected artifact is GEPA 12B Six Candidate. It is more precise, has fewer false positives, and eliminated structural failures. The main caveat is one additional false negative.
#59878-0.7500 Session lane stuck in 'running' after run dies ā sessions.abort + gateway restart fail to clear stale state gold sessions, reliability; seed sessions, reliability; GEPA sessions, gateway
#46552+0.5000 docs(queue): clarify steer behavior with partial streaming and tool boundaries gold queueing, docs; seed docs; GEPA docs, queueing
+
#48580+0.3000 Bug: acpx codex sessions å建ēä¼čÆē«å³éåŗ - stdin is not a terminal gold acpx, codex, sessions; seed acp, sessions; GEPA codex, sessions
+
#83863+0.3000 ACP/Codex child tasks can be marked succeeded with progress-only output and no final deliverable gold acp, codex, agent_runtime; seed acp, reliability; GEPA acp, codex
Policy: add model, network, and MCP conformance checks
+
0.2000
1.0000
+0.8000
+
mcp_toolingconfigsecurity
+
mcp_toolinglocal_model_providers
+
mcp_toolingsecurityconfig
+
local_model_providers FN configsecurity
+
none FN none
+
+
+
+
+
Per-Topic TP/FP/FN and F1
+
Cells in TP/FP/FN order. Rows are sorted by F1 delta, worst first.
+
Topic
Seed TP/FP/FN
Seed P
Seed R
Seed F1
GEPA TP/FP/FN
GEPA P
GEPA R
GEPA F1
ĪF1
ĪFP
ĪFN
local_models
+
2/1/1
0.6667
0.6667
0.6667
+
0/0/3
0.0000
0.0000
0.0000
+
-0.6667
-1
+2
telemetry_usage
+
1/1/0
0.5000
1.0000
0.6667
+
0/0/1
0.0000
0.0000
0.0000
+
-0.6667
-1
+1
tool_calling
+
2/0/0
1.0000
1.0000
1.0000
+
1/0/2
1.0000
0.3333
0.5000
+
-0.5000
+0
+2
reliability
+
9/2/6
0.8182
0.6000
0.6923
+
5/1/11
0.8333
0.3125
0.4545
+
-0.2378
-1
+5
agent_runtime
+
1/3/4
0.2500
0.2000
0.2222
+
0/2/5
0.0000
0.0000
0.0000
+
-0.2222
-1
+1
ui_tui
+
1/1/0
0.5000
1.0000
0.6667
+
1/2/0
0.3333
1.0000
0.5000
+
-0.1667
+1
+0
gateway
+
4/2/1
0.6667
0.8000
0.7273
+
4/4/1
0.5000
0.8000
0.6154
+
-0.1119
+2
+0
sessions
+
14/2/4
0.8750
0.7778
0.8235
+
12/1/7
0.9231
0.6316
0.7500
+
-0.0735
-1
+3
acp
+
13/1/0
0.9286
1.0000
0.9630
+
11/0/2
1.0000
0.8462
0.9167
+
-0.0463
-1
+2
local_model_providers
+
0/1/3
0.0000
0.0000
0.0000
+
0/0/3
0.0000
0.0000
0.0000
+
+0.0000
-1
+0
model_releases
+
0/1/0
0.0000
0.0000
0.0000
+
0/0/0
0.0000
0.0000
0.0000
+
+0.0000
-1
+0
notifications
+
0/0/1
0.0000
0.0000
0.0000
+
0/1/1
0.0000
0.0000
0.0000
+
+0.0000
+1
+0
api_surface
+
0/0/2
0.0000
0.0000
0.0000
+
0/0/2
0.0000
0.0000
0.0000
+
+0.0000
+0
+0
approvals
+
1/0/1
1.0000
0.5000
0.6667
+
1/0/1
1.0000
0.5000
0.6667
+
+0.0000
+0
+0
browser_automation
+
1/0/1
1.0000
0.5000
0.6667
+
1/0/1
1.0000
0.5000
0.6667
+
+0.0000
+0
+0
coding_agents
+
0/0/6
0.0000
0.0000
0.0000
+
0/0/6
0.0000
0.0000
0.0000
+
+0.0000
+0
+0
docs
+
2/0/0
1.0000
1.0000
1.0000
+
2/0/0
1.0000
1.0000
1.0000
+
+0.0000
+0
+0
mcp_tooling
+
3/0/0
1.0000
1.0000
1.0000
+
3/0/0
1.0000
1.0000
1.0000
+
+0.0000
+0
+0
open_weight_models
+
1/0/0
1.0000
1.0000
1.0000
+
1/0/0
1.0000
1.0000
1.0000
+
+0.0000
+0
+0
sandboxing
+
2/0/1
1.0000
0.6667
0.8000
+
2/0/1
1.0000
0.6667
0.8000
+
+0.0000
+0
+0
self_hosted_inference
+
1/0/2
1.0000
0.3333
0.5000
+
1/0/2
1.0000
0.3333
0.5000
+
+0.0000
+0
+0
chat_integrations
+
4/3/0
0.5714
1.0000
0.7273
+
3/0/1
1.0000
0.7500
0.8571
+
+0.1299
-3
+1
memory
+
3/0/2
1.0000
0.6000
0.7500
+
4/0/1
1.0000
0.8000
0.8889
+
+0.1389
+0
-1
exec_tools
+
3/0/1
1.0000
0.7500
0.8571
+
4/0/0
1.0000
1.0000
1.0000
+
+0.1429
+0
-1
acpx
+
2/0/5
1.0000
0.2857
0.4444
+
3/0/4
1.0000
0.4286
0.6000
+
+0.1556
+0
-1
security
+
3/4/2
0.4286
0.6000
0.5000
+
6/3/1
0.6667
0.8571
0.7500
+
+0.2500
-1
-1
model_serving
+
4/3/0
0.5714
1.0000
0.7273
+
4/0/0
1.0000
1.0000
1.0000
+
+0.2727
-3
+0
codex
+
3/0/4
1.0000
0.4286
0.6000
+
6/0/1
1.0000
0.8571
0.9231
+
+0.3231
+0
-3
auth_identity
+
1/0/1
1.0000
0.5000
0.6667
+
3/0/0
1.0000
1.0000
1.0000
+
+0.3333
+0
-1
queueing
+
1/0/1
1.0000
0.5000
0.6667
+
2/0/0
1.0000
1.0000
1.0000
+
+0.3333
+0
-1
cron_automation
+
1/1/3
0.5000
0.2500
0.3333
+
3/1/1
0.7500
0.7500
0.7500
+
+0.4167
+0
-2
hooks
+
0/0/1
0.0000
0.0000
0.0000
+
1/1/1
0.5000
0.5000
0.5000
+
+0.5000
+1
+0
packaging_deployment
+
0/1/2
0.0000
0.0000
0.0000
+
1/1/1
0.5000
0.5000
0.5000
+
+0.5000
+0
-1
skills_plugins
+
0/2/2
0.0000
0.0000
0.0000
+
2/2/1
0.5000
0.6667
0.5714
+
+0.5714
+0
-1
config
+
0/0/3
0.0000
0.0000
0.0000
+
3/4/0
0.4286
1.0000
0.6000
+
+0.6000
+4
-3
+
+
+
Structural Failures
+
Candidate
Row
Title
Error
Seed
openclaw-openclaw-63826
security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification
classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260612T201105Z-3532713.md
+schema: /home/bob/.local/state/localpager/classifier/schemas/20260612T201105Z-3532713.json
+session: /home/bob/.local/state/localpager/classifier/sessions/20260612T201105Z-3532713
+localpager-agent: final_json was not called; no structured output was captured
Seed
openclaw-openclaw-84752
fix: self-heal lane wedges + restore openai-codex OAuth on embedded path
classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260612T201357Z-3534749.md
+schema: /home/bob/.local/state/localpager/classifier/schemas/20260612T201357Z-3534749.json
+session: /home/bob/.local/state/localpager/classifier/sessions/20260612T201357Z-3534749
+localpager-agent: final_json was not called; no structured output was captured
Seed
openclaw-openclaw-43246
fix(message): deny same-provider cross-context sends by default [AI-assisted]
classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260612T204409Z-3555993.md
+schema: /home/bob/.local/state/localpager/classifier/schemas/20260612T204409Z-3555993.json
+session: /home/bob/.local/state/localpager/classifier/sessions/20260612T204409Z-3555993
+localpager-agent: final_json was not called; no structured output was captured
Live properly budgeted 12B GEPA run continuing from the best candidate found so far. Runtime settings: row_limit=18, concurrency=2, max_tokens=1536, max_metric_calls=240. This page fetches run_log.txt every 30 seconds and redraws the graph.
The promoted v3 candidate clears the strict gates and keeps mean predicted labels at the v9.1 baseline. The main caveat is scientific, not mechanical: the manual repair used mistakes from this same 60-row set, so a fresh holdout is still needed before treating it as deployment evidence.
+
+
+
+
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/best.prompt.md b/gepa-12b-multi-from-six-20260613T051216Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..4f89987fcdafb644be0d19f7adc676a9b1b80997
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/best.prompt.md
@@ -0,0 +1,165 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Domain rules and corrections:
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Example: `cron force-run deadlock` => `cron_automation`.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- Example: `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/best.routing_policy.md b/gepa-12b-multi-from-six-20260613T051216Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..d6672073ec12922d43d68be740cb5dc79d4be781
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/best.routing_policy.md
@@ -0,0 +1,110 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Domain rules and corrections:
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Example: `cron force-run deadlock` => `cron_automation`.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- Example: `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/candidate_tree.html b/gepa-12b-multi-from-six-20260613T051216Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..81a42482277c2ab64edaaade866d02863235ec18
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/candidate_tree.html
@@ -0,0 +1,179 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/candidates.json b/gepa-12b-multi-from-six-20260613T051216Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..b97fd0d7e6d00b402c829c2d26ad026198d889b3
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/candidates.json
@@ -0,0 +1,14 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, a title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, or policy change.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just context.\n\nDomain rules and corrections:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n - Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n - Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool calling behavior itself is the central feature or bug.\n\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n - Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n - Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n - Do not drop `skills_plugins` when plugins are the requested feature.\n\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n - Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n - Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n - OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n - Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n - Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d => `model_serving` only.\n\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config rules:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nDomain rules and corrections:\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n- Do not drop `skills_plugins` when plugins are the requested feature.\n\nNotifications and chat integrations:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Example: `cron force-run deadlock` => `cron_automation`.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.\n- Example: `layered browser diagnostics` => `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.\n\nMemory and inference:\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n- Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nComposite fixes:\n- If a title lists several independent fixes, classify each central fix up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "Add these routing corrections to the classifier instructions:\n\n- Treat compound titles as lists of central user-visible fixes. Classify each central item, but do not add labels for every noun.\n- `skills_plugins` is label spam unless the plugin system itself is the requested feature or bug: user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or skill/plugin availability.\n- In titles like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`, keep `codex` because Codex behavior is central, but do not add `skills_plugins` for \u201cstartup plugins\u201d unless the plugin lifecycle is the actual subject.\n- WhatsApp, Slack, chat history, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations` when central.\n- ACP session permission-mode work can require all three topics: `acp`, `approvals`, and `acpx`.\n- Specifically, titles mentioning per-binding or per-agent `permissionMode` for ACP sessions should include `acp`, `approvals`, and `acpx`. `permissionMode` is an approval/permission contract, and ACPX owns the ACP session/binding workflow concern.\n- Add `local_models` when the title centrally names local model apps or local model providers such as LM Studio.\n- LM Studio issues involving Responses API behavior, thinking blocks, streaming, request/response compatibility, or visibility of model output should usually include both `model_serving` and `local_models`.\n- Do not replace `local_models` with `self_hosted_inference` when the named subject is LM Studio or another local-model product/app rather than a generic inference server integration.\n- `Responses API`, invisible thinking blocks, OpenAI-compatible behavior, streaming lifecycle, request/response protocol handling, and model-output protocol bugs route to `model_serving`.\n\nAdditional suppression checks:\n- If `skills_plugins` was added only because the title contains \u201cplugins\u201d inside a broader Codex startup or OAuth fix, remove it unless plugin installation/discovery/inheritance/availability is the central user-visible bug.\n- If a chat product name such as WhatsApp appears as a central listed fix, include `chat_integrations`.\n- If ACP + `permissionMode` + per-binding/per-agent/session language appears, include `acpx` in addition to `acp` and `approvals`.\n- If LM Studio appears as a central subject, include `local_models`."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..419fcd3af0a7aa24bfd48eae393ee9fd0a664c28
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 217: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T051216Z-3827873.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T051216Z-3827873.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T051216Z-3827873')"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2b8b67b65d7409b73c98d205f7c1b4df3a87b014
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'security', 'config'), description='This PR implements configuration-governed conformance checks for MCP servers (mcp_tooling), network security (security), and general policy configurations (config).', caveats=('The \"model\" aspect refers to policy enforcement on provider lists rather than serving protocols, thus it is handled by config/mcp_tooling logic.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..077f883428d719a813a2277a17cec7ab70e33ac4
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'security'), description=\"The PR implements security hardening and contract enforcement for the 'exec' tool, specifically regarding safe binary path validation and allowlist logic.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..4c366505549412feb6e455f697f8e048b6e81bf2
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'codex', 'skills_plugins'), description='This feature request addresses plugin inheritance for the ACPX Codex sandbox, involving core components like acpx protocol handling, codex runtime environments, and skill/plugin integration.', caveats=('The issue explicitly involves the ACPX infrastructure, Codex runtime environment, and the functionality of user-installed skills/plugins.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..6c13c044275fa8dbbc315a576bc9218f90673c93
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'self_hosted_inference'), description='Adds an openai-compatible embedding provider to support various self-hosted inference engines like Ollama and vLLM within the memory system.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..8f1c7332095b5f27ad63eedfffebf6ffdf917c31
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T051752Z-3832136.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T051752Z-3832136.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T051752Z-3832136\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..6ac66b9f35a3aa1029479c98bf4273c8dae34224
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs',), description='The pull request exclusively modifies documentation by adding a YouTube tutorial link and embed to the showcase page.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..40ce9a06706ee0b2a31dd018529554eefc858fa3
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('open_weight_models', 'model_serving'), description='This PR corrects context window metadata for open-weight models served via Mantle to fix premature truncation issues.', caveats=('The pull request specifically targets the inclusion of known context windows for \"open-weight\" models on the Mantle provider.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..dba663c0e0f9ec0152157606f43ae5f26ccc51b8
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('cron_automation', 'chat_integrations'), description='Issue reports that per-agent identity settings are ignored during cron announcements and heartbeats sent to Slack, affecting both automation logic and chat integration behavior.', caveats=('The bug specifically involves the failure of agent identity overlays in automated notification paths (cron/heartbeat) for Slack.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..4ce50ec35386ab7269ea043925d27f870df0dff6
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description='The PR fixes a bug in processing OpenAI-compatible streaming responses, specifically handling reasoning content from providers like LM Studio.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..56090233996d9b7bd1a5110b1ab7e9ff77060d84
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'gateway', 'cron_automation'), description='This multi-faceted PR fixes critical issues across shell tool execution paths, gateway-level browser status diagnostics, and prevents deadlocks in concurrent cron operations.', caveats=('The PR addresses three distinct systems: tool execution logic (exec_tools), gateway-level service reporting and lane management (gateway), and cron job concurrency handling (cron_automation).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e353f74c98e9115eb5a33973e7c87ef73af4f72b
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'gateway'), description='The PR fixes gateway startup logic for Codex plugins and implements a new history feature for WhatsApp.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5023d7701c265eac9a5b17f112a82f0b99931af0
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description='The issue describes an OpenAI-compatible streaming bug where usage data is missed due to incorrect stream termination, falling under model_serving.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..a5f5fde5f2564418e2d6206a11ab4ef4ee2a1378
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals'), description='The issue proposes granular per-binding and per-agent permission modes for ACP sessions, directly impacting the core ACP protocol and approval flows.', caveats=('None.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7e9a7ac7f6ea521d3ec34c7d6d05796a5146bdaa
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'hooks'), description=\"The PR enhances the gateway's ability to correctly process and report outcomes from guarded hook actions during outbound message sending.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..edfd12855bd2bb7f3340f8fc9a9dd5f452d44051
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'reliability'), description='This PR fixes a deadlock in the active-memory recall subagent by isolating it onto its own dedicated lane.', caveats=(\"Included 'reliability' due to the specific fix for a concurrent execution deadlock.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..1c1accc25078a74d18945f28fa4592b815e4d39e
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('ui_tui', 'gateway'), description='The PR adds a new gateway RPC to fetch full message content for the web-based chat sidebar UI.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..1245dd3c42ae32556231660639343607f4d512a8
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T051540Z-3830567.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T051540Z-3830567.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T051540Z-3830567\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/gepa-result.json b/gepa-12b-multi-from-six-20260613T051216Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..2ceb20ce0a6192ccad73fe3394ec6de2a4cc1f7e
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/gepa-result.json
@@ -0,0 +1,223 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 2,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, a title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, or policy change.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just context.\n\nDomain rules and corrections:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n - Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n - Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool calling behavior itself is the central feature or bug.\n\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n - Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n - Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n - Do not drop `skills_plugins` when plugins are the requested feature.\n\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n - Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n - Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n - OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n - Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n - Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d => `model_serving` only.\n\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config rules:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nDomain rules and corrections:\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n- Do not drop `skills_plugins` when plugins are the requested feature.\n\nNotifications and chat integrations:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Example: `cron force-run deadlock` => `cron_automation`.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.\n- Example: `layered browser diagnostics` => `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.\n\nMemory and inference:\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n- Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nComposite fixes:\n- If a title lists several independent fixes, classify each central fix up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "Add these routing corrections to the classifier instructions:\n\n- Treat compound titles as lists of central user-visible fixes. Classify each central item, but do not add labels for every noun.\n- `skills_plugins` is label spam unless the plugin system itself is the requested feature or bug: user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or skill/plugin availability.\n- In titles like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`, keep `codex` because Codex behavior is central, but do not add `skills_plugins` for \u201cstartup plugins\u201d unless the plugin lifecycle is the actual subject.\n- WhatsApp, Slack, chat history, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations` when central.\n- ACP session permission-mode work can require all three topics: `acp`, `approvals`, and `acpx`.\n- Specifically, titles mentioning per-binding or per-agent `permissionMode` for ACP sessions should include `acp`, `approvals`, and `acpx`. `permissionMode` is an approval/permission contract, and ACPX owns the ACP session/binding workflow concern.\n- Add `local_models` when the title centrally names local model apps or local model providers such as LM Studio.\n- LM Studio issues involving Responses API behavior, thinking blocks, streaming, request/response compatibility, or visibility of model output should usually include both `model_serving` and `local_models`.\n- Do not replace `local_models` with `self_hosted_inference` when the named subject is LM Studio or another local-model product/app rather than a generic inference server integration.\n- `Responses API`, invisible thinking blocks, OpenAI-compatible behavior, streaming lifecycle, request/response protocol handling, and model-output protocol bugs route to `model_serving`.\n\nAdditional suppression checks:\n- If `skills_plugins` was added only because the title contains \u201cplugins\u201d inside a broader Codex startup or OAuth fix, remove it unless plugin installation/discovery/inheritance/availability is the central user-visible bug.\n- If a chat product name such as WhatsApp appears as a central listed fix, include `chat_integrations`.\n- If ACP + `permissionMode` + per-binding/per-agent/session language appears, include `acpx` in addition to `acp` and `approvals`.\n- If LM Studio appears as a central subject, include `local_models`."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 26,
+ 52,
+ 78
+ ],
+ "num_full_val_evals": 4,
+ "objective_pareto_front": {
+ "weighted_score": 0.7361111111111112
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ],
+ [
+ 1
+ ],
+ [
+ 2
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 2
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 2
+ ],
+ "1": [
+ 0,
+ 2
+ ],
+ "2": [
+ 2
+ ],
+ "3": [
+ 3
+ ],
+ "4": [
+ 0,
+ 1,
+ 2
+ ],
+ "5": [
+ 3
+ ],
+ "6": [
+ 1
+ ],
+ "7": [
+ 0
+ ],
+ "8": [
+ 0
+ ],
+ "9": [
+ 1,
+ 2,
+ 3
+ ],
+ "10": [
+ 2
+ ],
+ "11": [
+ 0,
+ 1,
+ 2,
+ 3
+ ],
+ "12": [
+ 0,
+ 1,
+ 2
+ ],
+ "13": [
+ 2
+ ],
+ "14": [
+ 0,
+ 2,
+ 3
+ ],
+ "15": [
+ 1,
+ 2
+ ],
+ "16": [
+ 2
+ ],
+ "17": [
+ 3
+ ]
+ },
+ "run_dir": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z",
+ "seed": 0,
+ "total_metric_calls": 96,
+ "val_aggregate_scores": [
+ 0.4972222222222222,
+ 0.5380952380952381,
+ 0.7361111111111112,
+ 0.5088929588929588
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.4972222222222222
+ },
+ {
+ "weighted_score": 0.5380952380952381
+ },
+ {
+ "weighted_score": 0.7361111111111112
+ },
+ {
+ "weighted_score": 0.5088929588929589
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 0.0,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.25,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.0,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.0,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.25,
+ "17": 0.5
+ },
+ {
+ "0": 0.5,
+ "1": 0.0,
+ "2": 0.5,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 0.2857142857142857,
+ "15": 0.5,
+ "16": 0.2,
+ "17": 0.5
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 0.0,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.5,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.5,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 0.5
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.15384615384615385,
+ "3": 1.0,
+ "4": 0.16666666666666666,
+ "5": 1.0,
+ "6": 0.2857142857142857,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 0.15384615384615385,
+ "13": 0.2,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.25,
+ "17": 1.0
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/gepa_state.bin b/gepa-12b-multi-from-six-20260613T051216Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..d2a078203e0a8bfd5d4edc6168202b9d055ba4d2
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b92f2612215b619430254c1627adcc22c907e17753a4bb9b4f615f4f48b940c2
+size 22761
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/optimize.stderr.log b/gepa-12b-multi-from-six-20260613T051216Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/optimize.stdout.json b/gepa-12b-multi-from-six-20260613T051216Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..52de493f2674b99a1852f5a091bd6788135aa166
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/optimize.stdout.json
@@ -0,0 +1,275 @@
+Iteration 0: Base program full valset score: 0.4972222222222222 over 18 / 18 examples
+Iteration 1: Selected program 0 score: 0.4972222222222222
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, a title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, or policy change.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just context.
+
+Domain rules and corrections:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+ - Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+ - Do not add `tool_calling` just because the title says ātool boundariesā unless tool calling behavior itself is the central feature or bug.
+
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+ - Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+ - Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+ - Do not drop `skills_plugins` when plugins are the requested feature.
+
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+ - Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+ - Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+ - OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+ - Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+ - Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā => `model_serving` only.
+
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config rules:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, or āchunkā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+Iteration 1: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 1: Found a better program on the valset with score 0.5380952380952381.
+Iteration 1: Valset score for new program: 0.5380952380952381 (coverage 18 / 18)
+Iteration 1: Val aggregate for new program: 0.5380952380952381
+Iteration 1: Individual valset scores for new program: {0: 0.5, 1: 0.0, 2: 0.5, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.5, 14: 0.2857142857142857, 15: 0.5, 16: 0.2, 17: 0.5}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.5380952380952381}
+Iteration 1: New valset pareto front scores: {0: 0.5, 1: 1.0, 2: 0.5, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.5380952380952381}
+Iteration 1: Valset pareto front aggregate score: 0.6777777777777777
+Iteration 1: Updated valset pareto front programs: {0: {1}, 1: {0}, 2: {1}, 3: {0, 1}, 4: {0, 1}, 5: {0, 1}, 6: {1}, 7: {0}, 8: {0}, 9: {1}, 10: {0, 1}, 11: {0, 1}, 12: {0, 1}, 13: {1}, 14: {0}, 15: {1}, 16: {0}, 17: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 1: Best valset aggregate score so far: 0.5380952380952381
+Iteration 1: Best program as per aggregate score on valset: 1
+Iteration 1: Best score on valset: 0.5380952380952381
+Iteration 1: Linear pareto front program index: 1
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 1 score: 0.5380952380952381
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Domain rules and corrections:
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Example: `cron force-run deadlock` => `cron_automation`.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- Example: `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 2: New subsample score 3.2 is better than old score 1.2. Continue to full eval and add to candidate pool.
+Iteration 2: Found a better program on the valset with score 0.7361111111111112.
+Iteration 2: Valset score for new program: 0.7361111111111112 (coverage 18 / 18)
+Iteration 2: Val aggregate for new program: 0.7361111111111112
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.7361111111111112}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 2: Valset pareto front aggregate score: 0.8194444444444444
+Iteration 2: Updated valset pareto front programs: {0: {2}, 1: {0, 2}, 2: {2}, 3: {0, 1}, 4: {0, 1, 2}, 5: {0, 1, 2}, 6: {1}, 7: {0}, 8: {0}, 9: {1, 2}, 10: {2}, 11: {0, 1, 2}, 12: {0, 1, 2}, 13: {2}, 14: {0, 2}, 15: {1, 2}, 16: {2}, 17: {0, 1, 2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 2: Best valset aggregate score so far: 0.7361111111111112
+Iteration 2: Best program as per aggregate score on valset: 2
+Iteration 2: Best score on valset: 0.7361111111111112
+Iteration 2: Linear pareto front program index: 2
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 2 score: 0.7361111111111112
+Iteration 3: Proposed new text for routing_policy: Add these routing corrections to the classifier instructions:
+
+- Treat compound titles as lists of central user-visible fixes. Classify each central item, but do not add labels for every noun.
+- `skills_plugins` is label spam unless the plugin system itself is the requested feature or bug: user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or skill/plugin availability.
+- In titles like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`, keep `codex` because Codex behavior is central, but do not add `skills_plugins` for āstartup pluginsā unless the plugin lifecycle is the actual subject.
+- WhatsApp, Slack, chat history, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations` when central.
+- ACP session permission-mode work can require all three topics: `acp`, `approvals`, and `acpx`.
+- Specifically, titles mentioning per-binding or per-agent `permissionMode` for ACP sessions should include `acp`, `approvals`, and `acpx`. `permissionMode` is an approval/permission contract, and ACPX owns the ACP session/binding workflow concern.
+- Add `local_models` when the title centrally names local model apps or local model providers such as LM Studio.
+- LM Studio issues involving Responses API behavior, thinking blocks, streaming, request/response compatibility, or visibility of model output should usually include both `model_serving` and `local_models`.
+- Do not replace `local_models` with `self_hosted_inference` when the named subject is LM Studio or another local-model product/app rather than a generic inference server integration.
+- `Responses API`, invisible thinking blocks, OpenAI-compatible behavior, streaming lifecycle, request/response protocol handling, and model-output protocol bugs route to `model_serving`.
+
+Additional suppression checks:
+- If `skills_plugins` was added only because the title contains āpluginsā inside a broader Codex startup or OAuth fix, remove it unless plugin installation/discovery/inheritance/availability is the central user-visible bug.
+- If a chat product name such as WhatsApp appears as a central listed fix, include `chat_integrations`.
+- If ACP + `permissionMode` + per-binding/per-agent/session language appears, include `acpx` in addition to `acp` and `approvals`.
+- If LM Studio appears as a central subject, include `local_models`.
+Iteration 3: New subsample score 2.571428571428571 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.5088929588929588 (coverage 18 / 18)
+Iteration 3: Val aggregate for new program: 0.5088929588929588
+Iteration 3: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.15384615384615385, 3: 1.0, 4: 0.16666666666666666, 5: 1.0, 6: 0.2857142857142857, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.2, 11: 1.0, 12: 0.15384615384615385, 13: 0.2, 14: 1.0, 15: 0.25, 16: 0.25, 17: 1.0}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.5088929588929589}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 3: Valset pareto front aggregate score: 0.9166666666666666
+Iteration 3: Updated valset pareto front programs: {0: {2}, 1: {0, 2}, 2: {2}, 3: {3}, 4: {0, 1, 2}, 5: {3}, 6: {1}, 7: {0}, 8: {0}, 9: {1, 2, 3}, 10: {2}, 11: {0, 1, 2, 3}, 12: {0, 1, 2}, 13: {2}, 14: {0, 2, 3}, 15: {1, 2}, 16: {2}, 17: {3}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 3: Best valset aggregate score so far: 0.7361111111111112
+Iteration 3: Best program as per aggregate score on valset: 2
+Iteration 3: Best score on valset: 0.7361111111111112
+Iteration 3: Linear pareto front program index: 2
+Iteration 3: New program candidate index: 3
+{
+ "best_idx": 2,
+ "best_prompt_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/best.routing_policy.md",
+ "best_score": 0.7361111111111112,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_metric_calls": 96,
+ "output_dir": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 18,
+ "seed": 0,
+ "seed_routing_policy_chars": 3224,
+ "seed_routing_policy_sha256": "f4b161bb9bbaf366f1d4f1841243d73544bbd3c553ca6be5eb2818e757007187"
+ },
+ "created_at": "2026-06-13T05:55:33.484027+00:00",
+ "num_candidates": 4,
+ "num_full_val_evals": 4,
+ "result_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/gepa-result.json",
+ "total_metric_calls": 96
+}
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/run_log.json b/gepa-12b-multi-from-six-20260613T051216Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..05f08c661928509a951e7000ebf4244cbdb67a06
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/run_log.json
@@ -0,0 +1,131 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 9,
+ 11,
+ 14,
+ 12
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.5,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 1,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 0,
+ 16,
+ 10,
+ 2
+ ],
+ "subsample_scores": [
+ 0.5,
+ 0.25,
+ 0.2,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.2,
+ 1.0
+ ],
+ "new_program_idx": 2,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 2,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 3,
+ 5,
+ 17,
+ 4
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.5,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.2857142857142857,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_program_idx": 3,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/run_log.txt b/gepa-12b-multi-from-six-20260613T051216Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4b08251411e6d15a687e1668e22951a1b41fd138
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/run_log.txt
@@ -0,0 +1,246 @@
+Iteration 0: Base program full valset score: 0.4972222222222222 over 18 / 18 examples
+Iteration 1: Selected program 0 score: 0.4972222222222222
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, a title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, or policy change.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just context.
+
+Domain rules and corrections:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+ - Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+ - Do not add `tool_calling` just because the title says ātool boundariesā unless tool calling behavior itself is the central feature or bug.
+
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+ - Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+ - Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+ - Do not drop `skills_plugins` when plugins are the requested feature.
+
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+ - Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+ - Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+ - OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+ - Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+ - Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā => `model_serving` only.
+
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config rules:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, or āchunkā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+Iteration 1: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 1: Found a better program on the valset with score 0.5380952380952381.
+Iteration 1: Valset score for new program: 0.5380952380952381 (coverage 18 / 18)
+Iteration 1: Val aggregate for new program: 0.5380952380952381
+Iteration 1: Individual valset scores for new program: {0: 0.5, 1: 0.0, 2: 0.5, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.5, 14: 0.2857142857142857, 15: 0.5, 16: 0.2, 17: 0.5}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.5380952380952381}
+Iteration 1: New valset pareto front scores: {0: 0.5, 1: 1.0, 2: 0.5, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.5380952380952381}
+Iteration 1: Valset pareto front aggregate score: 0.6777777777777777
+Iteration 1: Updated valset pareto front programs: {0: {1}, 1: {0}, 2: {1}, 3: {0, 1}, 4: {0, 1}, 5: {0, 1}, 6: {1}, 7: {0}, 8: {0}, 9: {1}, 10: {0, 1}, 11: {0, 1}, 12: {0, 1}, 13: {1}, 14: {0}, 15: {1}, 16: {0}, 17: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 1: Best valset aggregate score so far: 0.5380952380952381
+Iteration 1: Best program as per aggregate score on valset: 1
+Iteration 1: Best score on valset: 0.5380952380952381
+Iteration 1: Linear pareto front program index: 1
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 1 score: 0.5380952380952381
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Domain rules and corrections:
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Example: `cron force-run deadlock` => `cron_automation`.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- Example: `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central topic over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 2: New subsample score 3.2 is better than old score 1.2. Continue to full eval and add to candidate pool.
+Iteration 2: Found a better program on the valset with score 0.7361111111111112.
+Iteration 2: Valset score for new program: 0.7361111111111112 (coverage 18 / 18)
+Iteration 2: Val aggregate for new program: 0.7361111111111112
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.7361111111111112}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 2: Valset pareto front aggregate score: 0.8194444444444444
+Iteration 2: Updated valset pareto front programs: {0: {2}, 1: {0, 2}, 2: {2}, 3: {0, 1}, 4: {0, 1, 2}, 5: {0, 1, 2}, 6: {1}, 7: {0}, 8: {0}, 9: {1, 2}, 10: {2}, 11: {0, 1, 2}, 12: {0, 1, 2}, 13: {2}, 14: {0, 2}, 15: {1, 2}, 16: {2}, 17: {0, 1, 2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 2: Best valset aggregate score so far: 0.7361111111111112
+Iteration 2: Best program as per aggregate score on valset: 2
+Iteration 2: Best score on valset: 0.7361111111111112
+Iteration 2: Linear pareto front program index: 2
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 2 score: 0.7361111111111112
+Iteration 3: Proposed new text for routing_policy: Add these routing corrections to the classifier instructions:
+
+- Treat compound titles as lists of central user-visible fixes. Classify each central item, but do not add labels for every noun.
+- `skills_plugins` is label spam unless the plugin system itself is the requested feature or bug: user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or skill/plugin availability.
+- In titles like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`, keep `codex` because Codex behavior is central, but do not add `skills_plugins` for āstartup pluginsā unless the plugin lifecycle is the actual subject.
+- WhatsApp, Slack, chat history, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations` when central.
+- ACP session permission-mode work can require all three topics: `acp`, `approvals`, and `acpx`.
+- Specifically, titles mentioning per-binding or per-agent `permissionMode` for ACP sessions should include `acp`, `approvals`, and `acpx`. `permissionMode` is an approval/permission contract, and ACPX owns the ACP session/binding workflow concern.
+- Add `local_models` when the title centrally names local model apps or local model providers such as LM Studio.
+- LM Studio issues involving Responses API behavior, thinking blocks, streaming, request/response compatibility, or visibility of model output should usually include both `model_serving` and `local_models`.
+- Do not replace `local_models` with `self_hosted_inference` when the named subject is LM Studio or another local-model product/app rather than a generic inference server integration.
+- `Responses API`, invisible thinking blocks, OpenAI-compatible behavior, streaming lifecycle, request/response protocol handling, and model-output protocol bugs route to `model_serving`.
+
+Additional suppression checks:
+- If `skills_plugins` was added only because the title contains āpluginsā inside a broader Codex startup or OAuth fix, remove it unless plugin installation/discovery/inheritance/availability is the central user-visible bug.
+- If a chat product name such as WhatsApp appears as a central listed fix, include `chat_integrations`.
+- If ACP + `permissionMode` + per-binding/per-agent/session language appears, include `acpx` in addition to `acp` and `approvals`.
+- If LM Studio appears as a central subject, include `local_models`.
+Iteration 3: New subsample score 2.571428571428571 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.5088929588929588 (coverage 18 / 18)
+Iteration 3: Val aggregate for new program: 0.5088929588929588
+Iteration 3: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.15384615384615385, 3: 1.0, 4: 0.16666666666666666, 5: 1.0, 6: 0.2857142857142857, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.2, 11: 1.0, 12: 0.15384615384615385, 13: 0.2, 14: 1.0, 15: 0.25, 16: 0.25, 17: 1.0}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.5088929588929589}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 3: Valset pareto front aggregate score: 0.9166666666666666
+Iteration 3: Updated valset pareto front programs: {0: {2}, 1: {0, 2}, 2: {2}, 3: {3}, 4: {0, 1, 2}, 5: {3}, 6: {1}, 7: {0}, 8: {0}, 9: {1, 2, 3}, 10: {2}, 11: {0, 1, 2, 3}, 12: {0, 1, 2}, 13: {2}, 14: {0, 2, 3}, 15: {1, 2}, 16: {2}, 17: {3}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 3: Best valset aggregate score so far: 0.7361111111111112
+Iteration 3: Best program as per aggregate score on valset: 2
+Iteration 3: Best score on valset: 0.7361111111111112
+Iteration 3: Linear pareto front program index: 2
+Iteration 3: New program candidate index: 3
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/run_log_stderr.txt b/gepa-12b-multi-from-six-20260613T051216Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-multi-from-six-20260613T051216Z/summary.json b/gepa-12b-multi-from-six-20260613T051216Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..4c8fc98c86fb7235ef4f3def15fc062b524b6a2f
--- /dev/null
+++ b/gepa-12b-multi-from-six-20260613T051216Z/summary.json
@@ -0,0 +1,29 @@
+{
+ "best_idx": 2,
+ "best_prompt_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/best.routing_policy.md",
+ "best_score": 0.7361111111111112,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_metric_calls": 96,
+ "output_dir": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 18,
+ "seed": 0,
+ "seed_routing_policy_chars": 3224,
+ "seed_routing_policy_sha256": "f4b161bb9bbaf366f1d4f1841243d73544bbd3c553ca6be5eb2818e757007187"
+ },
+ "created_at": "2026-06-13T05:55:33.484027+00:00",
+ "num_candidates": 4,
+ "num_full_val_evals": 4,
+ "result_path": "prompt-optimizer/out/gepa-12b-multi-from-six-20260613T051216Z/gepa-result.json",
+ "total_metric_calls": 96
+}
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055657Z/optimize.stderr.log b/gepa-12b-proper-from-best-20260613T055657Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055657Z/optimize.stdout.json b/gepa-12b-proper-from-best-20260613T055657Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055657Z/run_log.txt b/gepa-12b-proper-from-best-20260613T055657Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055657Z/run_log_stderr.txt b/gepa-12b-proper-from-best-20260613T055657Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055809Z/optimize.stderr.log b/gepa-12b-proper-from-best-20260613T055809Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055809Z/optimize.stdout.json b/gepa-12b-proper-from-best-20260613T055809Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055809Z/optimizer.pid b/gepa-12b-proper-from-best-20260613T055809Z/optimizer.pid
new file mode 100644
index 0000000000000000000000000000000000000000..bc15c1bcd6b7f03f54c6017502db290bfcef09b9
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055809Z/optimizer.pid
@@ -0,0 +1 @@
+3859822
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/best.prompt.md b/gepa-12b-proper-from-best-20260613T055906Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..173d1302fa1c9541656d4b7e35181feb7e77b54b
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/best.prompt.md
@@ -0,0 +1,206 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Output only JSON.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md b/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..f59c8c3a0e0e65d3cd98518dc690aed4cb5bfde8
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md
@@ -0,0 +1,151 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Output only JSON.
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/candidate_tree.html b/gepa-12b-proper-from-best-20260613T055906Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..a0ec2514880db7381dec53915376dd6013ee9b6c
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/candidate_tree.html
@@ -0,0 +1,194 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/candidates.json b/gepa-12b-proper-from-best-20260613T055906Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..45fc333b1525b2752b5b356b2f2d74f2271e02c2
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/candidates.json
@@ -0,0 +1,29 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nDomain rules and corrections:\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n- Do not drop `skills_plugins` when plugins are the requested feature.\n\nNotifications and chat integrations:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Example: `cron force-run deadlock` => `cron_automation`.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.\n- Example: `layered browser diagnostics` => `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.\n\nMemory and inference:\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n- Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nComposite fixes:\n- If a title lists several independent fixes, classify each central fix up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON, for example:\n{\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they name explicit central subjects that a maintainer must see.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area. In particular, do not add `gateway` unless gateway behavior or ownership is explicitly the subject.\n\nTopic rules and corrections:\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- A title like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` includes `codex`.\n- Do not add `gateway` for Codex startup, OAuth, Docker, or plugin startup unless gateway is explicitly the user-visible subject.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions \u201cstartup plugins\u201d unless plugin availability or user-installed plugin behavior is central.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- WhatsApp history is `chat_integrations`.\n- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.\n- Do not replace chat topics with `gateway` unless the gateway itself is the subject.\n\nNotifications:\n- Use `notifications` for announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n\nACP, ACPX, approvals, and sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`: ACP sessions are central, permissionMode maps to approvals, and per-binding/per-agent ACP session permission behavior belongs in the ACPX workflow bucket.\n- Do not drop `acpx` when ACP session permissions are framed around per-binding or per-agent behavior.\n\nGateway and runtime:\n- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n\nLocal models:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` should be `model_serving` and `local_models`: LM Studio is the local model provider, and Responses API/thinking block visibility is model-serving protocol behavior.\n\nSelf-hosted inference:\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- For memory or embeddings providers, combine with `memory` when memory/embeddings is central.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nMemory:\n- Use `memory` for memory, embeddings, vector stores, embedding providers, or memory provider behavior.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n\nPolicy/config/security/MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON, for example:\n{\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they name explicit central subjects that a maintainer must see.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area.\n\nImportant corrections:\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing. Use `notifications` only when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, or notification delivery.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central. \u201coutbound session identity\u201d is `sessions`.\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, before_prompt_build, or internal execution path. Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be the user-visible subject.\n- A title like `active-memory recall subagent can deadlock on the main lane inside before_prompt_build` should be `memory` and `reliability`.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when the item is about provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support such as `amazon-bedrock-mantle`.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with a bracketed agent/source marker like `[codex]`. Treat bracketed `[codex]` as metadata unless the actual subject is Codex behavior.\n- A title like `[codex] fix webchat full-message reader for truncated history` should not include `codex`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- A title like `[codex] fix webchat full-message reader for truncated history` should be `gateway`, `api_surface`, and `ui_tui` when the reader is gateway-backed or gateway-owned.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- WhatsApp history is `chat_integrations`.\n- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.\n- Do not replace chat topics with `gateway` unless the gateway itself is the subject.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions \u201cstartup plugins\u201d unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals, and sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nPolicy/config/security/MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n- Suppress `queueing` for incidental words like lane, deadlock, main lane, worker, subagent, or internal hook names unless queue lifecycle or queued execution is the actual subject.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201csend\u201d, \u201clane\u201d, \u201cdeadlock\u201d, \u201cCodex\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, bracketed metadata, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nOutput format:\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.\n- Always return a valid object with `topics_of_interest` as an array. Use an empty array only when no allowed topic is central.\n- Never invent topic ids.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.\n\nProvider catalogs and local model providers:\n- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, or missing/unsupported model entry.\n- Provider prefixes such as `google-vertex:` are strong signals for `local_model_providers` when the title is about provider catalog behavior.\n- Model names such as Gemini variants, including `gemini-3.1-flash-lite`, are not enough for `model_serving`; classify the provider/catalog issue instead.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nReliability:\n- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.\n- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.\n- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central maintainer bucket over broad fallback labels.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Do not label every noun. Prefer the narrow central subject over broad or adjacent labels.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area, if that area is explicitly central.\n- `docs(queue): clarify steer behavior...` => `docs`, `queueing`.\n- Tutorial, guide, docs, README, or showcase catalog additions are usually `docs` only unless the title clearly names a second central product area.\n- Example: `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` => `docs`.\n- Do not add `agent_demos` just because the title mentions a tutorial, showcase, example, or educational resource. Treat that as label spam unless the actual subject is an agent demo feature, demo runtime, demo app behavior, or a maintainer-owned demo surface.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, APIs, selection, schema handling, or runtime behavior.\n- Do not add it for \u201ctool boundaries\u201d in docs about another system unless tool-call behavior is central.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n\nNotifications, chat, and cron:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Use `cron_automation` only when cron scheduling, force-run, lifecycle, execution, or deadlock is central.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when all are central.\n- Do not replace sandboxing or approvals with `security` unless the title is about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, automation layers, runtime behavior, and browser tooling route to `browser_automation`.\n- Do not add `gateway` unless gateway itself is explicitly central.\n\nMemory, inference, and model serving:\n- Memory or embeddings provider work routes to `memory`.\n- Self-hosted inference servers like llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when central.\n- Network policy, access restrictions, outbound rules, or boundary checks can be `security`.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d or \u201cmodel conformance\u201d to `model_serving` unless serving endpoints or protocol behavior are central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201csecurity\u201d, \u201cshowcase\u201d, \u201ctutorial\u201d, or \u201cdemo\u201d, verify that the topic is actually the subject.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, educational resources, or incidental words.\n- Keep required central second and third topics only when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nOutput only JSON."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nOutput format:\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.\n- Always return a valid object with `topics_of_interest` as an array.\n- Use an empty array only when no allowed topic is central.\n- Never invent topic ids.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.\n\nSpecific routing rules:\n\nCodex:\n- Use `codex` when the title centrally concerns Codex startup, Codex OAuth, Codex runtime behavior, Codex configuration, Dockerized Codex, or Codex-specific user workflows.\n- A title with multiple Codex-related facets, such as startup behavior plus Docker Codex OAuth, usually still needs only `codex` unless another non-Codex product area is also central.\n\nChat integrations:\n- Use `chat_integrations` when the title centrally names WhatsApp, chat history, chat import/export, chat connectors, messaging platforms, or conversation integration behavior.\n- \u201cWhatsApp history\u201d is a strong signal for `chat_integrations`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n\nSkills and plugins:\n- Do not add `skills_plugins` merely because the word \u201cplugins\u201d appears.\n- Treat \u201cCodex startup plugins\u201d as part of the Codex startup surface unless the title is centrally about the plugin system, plugin APIs, plugin installation, plugin lifecycle, skill/plugin registry, or skills behavior.\n- `skills_plugins` is label spam when plugins are only one detail inside a broader Codex fix.\n\nGateway:\n- Do not add `gateway` merely because the title mentions Docker, OAuth, startup, routing, service setup, or Codex infrastructure.\n- Use `gateway` only when the gateway itself, gateway routing, gateway API behavior, or gateway service contract is the central subject.\n\nOpen-weight models:\n- Use `open_weight_models` when the title centrally concerns open-weight model families, open-weight model metadata, context windows for open-weight models, or model support for open-weight models.\n- Mantle open-weight models are a strong signal for `open_weight_models`.\n\nProvider catalogs and local model providers:\n- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, model metadata for a provider, context-window metadata in a provider catalog, or missing/unsupported provider model entry.\n- Provider prefixes/scopes such as `amazon-bedrock-mantle`, `amazon-bedrock`, `google-vertex:`, and similar provider-qualified names are strong signals for `local_model_providers` when the title is about provider catalog behavior or provider-specific model metadata.\n- Adding known context windows for models under a provider is both provider-catalog work and model-metadata work.\n- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.\n\nReliability:\n- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.\n- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.\n- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, LocalAI, context windows, or model metadata.\n- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201cDocker\u201d, \u201cOAuth\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central maintainer bucket over broad fallback labels.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Prefixes naming providers or integrations, such as `google-vertex:`, `lm-studio:`, or similar, are often central maintainer-routing signals.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nAdditional domain rules learned from examples:\n\nModel provider catalogs and provider integrations:\n- Missing models, provider catalogs, model registries, provider capability lists, provider metadata, or provider-specific model availability route to `local_model_providers` when central.\n- This includes cloud or hosted provider names when the issue is about the OpenClaw provider integration/catalog, such as `google-vertex` and Gemini model catalog entries.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n- Do not classify provider catalog/model availability bugs as `model_serving` unless the central subject is serving endpoints, API protocol behavior, streaming lifecycle, endpoint compatibility, base URL behavior, or request routing.\n\nReliability:\n- Use `reliability` when the title centrally describes silent failure, missing errors, false success, hangs, deadlocks, crashes, data loss, or failure modes that make the system unreliable.\n- Add `reliability` as a secondary topic when an explicit reliability failure mode is part of the user-visible problem.\n- Do not add `reliability` to every bug fix. It must be explicit, such as \u201csilent failure instead of error.\u201d\n\nLocal models:\n- Use `local_models` when a local model product/runtime/provider is central, such as LM Studio, Ollama, llama.cpp, LocalAI, or similar local model environments.\n- If the title names LM Studio as the affected system, include `local_models` even when the API/protocol issue also routes elsewhere.\n- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible or Responses API request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or protocol-specific content handling.\n- Responses API behavior can be `model_serving` when the bug is about how served model responses are represented or consumed.\n- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n- Do not add `model_serving` merely because a title mentions a model name, provider, provider catalog, OpenAI-compatible provider, Gemini, Vertex, LM Studio, llama.cpp, Ollama, vLLM, TGI, or LocalAI. Confirm the serving/API protocol itself is central.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput:\nYou may receive a target URL, title, body, summary, labels, examples, tests, files changed, or prior discussion. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.\n\nDecision process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, example, test fixture, implementation detail, internal hook, backend used to reproduce a bug, broad ownership area, or label-spam keyword.\n\nImportant example:\n- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.\n Do not add `local_models` just because `llama.cpp` appears. In this title, llama.cpp is the affected backend/example; the central issue is OpenAI-compatible streaming protocol behavior and the final usage chunk lifecycle.\n\nTopic rules:\n\nNotifications:\nUse `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\nDo not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\nUse `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\nSlack target-channel pushes and WhatsApp history are `chat_integrations`.\n\nCron:\nUse `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\nDo not add it merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\nUse `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\nExec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\nDo not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\nUse `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\nDo not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\nUse `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\nActive-memory recall deadlocks are usually `memory` plus `reliability`.\n\nGateway and sessions:\nUse `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\nUse `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n\u201cOutbound session identity\u201d is `sessions`.\nA title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` is `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\nUse `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\nUse `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\nKnown context windows for open-weight Mantle models should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\nUse `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\nDo not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\nUse `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\nUse `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\nWebchat full-message reader behavior is both `api_surface` and `ui_tui`. If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\nUse `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\nDo not add it merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\nUse `acp` when ACP is named centrally.\nUse `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\nUse `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n`[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.\n`ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\nUse `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\nThinking blocks, Responses API visibility, OpenAI-compatible response formatting, streaming protocol behavior, and final usage chunk handling are `model_serving`.\nDo not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\nUse `local_models` when a local model app/provider/runtime is itself central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\nUse `self_hosted_inference` when using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers is itself the feature or bug.\nSuppress both `local_models` and `self_hosted_inference` when the provider name is only an affected backend, reproduction environment, compatibility example, or implementation detail for a different central subject such as OpenAI-compatible streaming.\n\nTelemetry and usage:\nUse `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\nUse `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\nDo not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\nUse `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\nUse `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\nUse `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\nDo not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\nDocumentation-only PRs should usually include `docs` plus the central documented area.\nA `docs(queue): ...` title usually includes both `docs` and `queueing`.\nDo not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\nUse `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\nSuppress it for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\nUse `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\nMentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\nDo not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, llama.cpp, Ollama, vLLM, TGI, LocalAI, or LM Studio. Keep it only if that topic is actually a central maintainer-owned subject."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..da9f4ed7e6a313a8b9e3a66d215475ec28bc46cf
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'agent_runtime'), description='The PR introduces a gateway-owned node-backed runtime for the ACP protocol, involving core protocol changes, gateway-managed state, and agent execution.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7f83bddaa0bd3ef6f11006b0fcb6c862fcfaad96
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'security', 'config'), description='This PR expands the Policy system to include conformance checks for model providers, network SSRF protection, and MCP servers.', caveats=('The PR specifically addresses \"model\" and \"network\" as part of a \"Policy\" conformance layer, which aligns with config and security respectively.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5b5b2afcf006e83fed50e312dfd2e834cc020529
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'sandboxing', 'approvals'), description=\"This PR implements 'Exec V2' contract refinements involving safe-bin logic, allowlist path/glob matching, and hardening of command execution permissions.\", caveats=('The title and content explicitly match the \"Exec v2 contract follow-through\" pattern described in the guidelines.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..65b1ab4df260544a92e4575d9925f269b72691c3
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'codex', 'skills_plugins'), description='The issue requests that the ACPX Codex sandbox inherit user-installed plugins, involving core features of acpx, codex runtime, and skills_plugins.', caveats=('Title and body explicitly mention all three components: ACPX context, Codex integration, and user-installed plugins.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..b9d15d885ffb41de2c22c33e3be34a6059e1dc2a
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'self_hosted_inference'), description='Adds a new openai-compatible embedding provider to support self-hosted inference servers like llama.cpp and Ollama within the memory system.', caveats=(\"The title explicitly mentions 'memory/embeddings' and lists specific self-hosted tools (llama.cpp, Ollama, vLLM) as targets for the new adapter.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..b5124fbdd184c432d7c0b47317254b6927cb0f57
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'agent_runtime'), description=\"The issue describes a missing model in the provider catalog (local_model_providers) and a corresponding failure in the fallback logic of the agent's runtime (agent_runtime).\", caveats=('The issue involves both a missing entry in the provider catalog and a logic flaw in the fallback engine during agent execution.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ced50af46ed6598687fd2725a16df63293e6c5ed
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs',), description='The PR exclusively updates the markdown files to include a new tutorial video link and embed, making it a documentation-only change.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..a58d4d89d6d127d397a23fde16e22e9537233331
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('open_weight_models',), description='The PR adds correct context window metadata for a specific set of open-weight models on the Mantle infrastructure.', caveats=('The PR specifically addresses metadata for open-weight models on the Mantle platform.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..0e85faeb293018fb5c4964ee27af81a5cfd5a67d
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('notifications', 'chat_integrations'), description='The issue concerns a failure to apply per-agent identity overlays during outbound Slack notifications on both cron and heartbeat paths.', caveats=(\"The bug specifically affects the 'cron --announce' and 'heartbeat' paths, but the underlying failure is in notification identity mapping rather than cron scheduling logic.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..f6b9d6664605e1b00bf43f87b9d27676d9cd0a71
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description=\"The PR fixes an issue where thinking blocks are missing when using specific OpenAI-compatible providers (like LM Studio) by correctly parsing the 'reasoning' data in the Responses API stream.\", caveats=('The focus is on the handling of reasoning data within the OpenAI-compatible response streaming protocol.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..0d6daa631bba121465b949ffd3c3d82a3d9dd0c4
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'browser_automation', 'cron_automation'), description='This pull request addresses concurrent issues in shell execution (exec), browser diagnostic reporting, and cron job lifecycle management.', caveats=('The PR is a composite fix addressing three distinct subsystems: shell execution, browser diagnostics, and cron management.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e12bddebac07694ed427b2b50cf2c669386f6e1e
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'skills_plugins'), description='The PR fixes Codex startup plugin discovery and integrates WhatsApp history with updated Docker/Codex authentication wiring.', caveats=('The PR addresses a combination of Codex-specific runtime issues, plugin resolution logic (skills_plugins), and WhatsApp integration infrastructure.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..755c76748e2c245a302422b790a85da08ff08f4e
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description='The issue concerns OpenAI-compatible streaming, SSE chunk handling, and response protocol behavior for usage data, which falls under model_serving.', caveats=('None.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..0f95aec31fa620f728ba5db3fb8cc7d4d2ec6fda
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('approvals', 'acp'), description=\"The request proposes adding granular, per-binding and per-agent overrides for the 'permissionMode' setting within ACP sessions to avoid over-provisioning permissions.\", caveats=(\"The issue mentions both acpx and acp; since it refers to 'ACP sessions', 'acp' is included as the primary protocol category.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..10bb4bedb757d6ba1bf5ba0ad32d67760138022c
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway',), description=\"This PR improves the gateway's ability to pass session identity and report cancellation reasons when handling outbound message sending.\", caveats=('The term \"guarded\" refers to internal logic for hook-based guard plugins within the gateway architecture, not a top-level security policy change.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..30fe1e0d178ed6ec2e88f827e7b9b9b1fbd7d8fc
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory',), description=\"The PR fixes a deadlock in the 'active-memory' extension by isolating the recall subagent onto a dedicated lane.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9891069dff06cdf79e5a87717267d0a3264b47e8
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'gateway', 'ui_tui'), description='Fixes a gateway RPC and UI logic to provide access to full-message content for truncated items in the webchat sidebar.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..0eef992d2e0b1129ec0b7871ffa2bd34d194d330
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description='The PR updates documentation specifically regarding queue steering behavior and its interaction with streaming modes.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/gepa-result.json b/gepa-12b-proper-from-best-20260613T055906Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..23df73ad64c22f17596b62ad28316faefd55376c
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/gepa-result.json
@@ -0,0 +1,434 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 5,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nDomain rules and corrections:\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX-related sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n- Do not drop `skills_plugins` when plugins are the requested feature.\n\nNotifications and chat integrations:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Example: `cron force-run deadlock` => `cron_automation`.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract follow-through or contract enforcement can centrally include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Example: `test(exec): land exec v2 contract follow-through` => `exec_tools`, `sandboxing`, `approvals`.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.\n- Example: `layered browser diagnostics` => `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.\n\nMemory and inference:\n- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.\n- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.\n- Do not add `model_serving` merely because the title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`, not `model_serving`.\n\nComposite fixes:\n- If a title lists several independent fixes, classify each central fix up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central topic over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON, for example:\n{\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they name explicit central subjects that a maintainer must see.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area. In particular, do not add `gateway` unless gateway behavior or ownership is explicitly the subject.\n\nTopic rules and corrections:\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- A title like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` includes `codex`.\n- Do not add `gateway` for Codex startup, OAuth, Docker, or plugin startup unless gateway is explicitly the user-visible subject.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions \u201cstartup plugins\u201d unless plugin availability or user-installed plugin behavior is central.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- WhatsApp history is `chat_integrations`.\n- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.\n- Do not replace chat topics with `gateway` unless the gateway itself is the subject.\n\nNotifications:\n- Use `notifications` for announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.\n\nACP, ACPX, approvals, and sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`: ACP sessions are central, permissionMode maps to approvals, and per-binding/per-agent ACP session permission behavior belongs in the ACPX workflow bucket.\n- Do not drop `acpx` when ACP session permissions are framed around per-binding or per-agent behavior.\n\nGateway and runtime:\n- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n\nLocal models:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` should be `model_serving` and `local_models`: LM Studio is the local model provider, and Responses API/thinking block visibility is model-serving protocol behavior.\n\nSelf-hosted inference:\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- For memory or embeddings providers, combine with `memory` when memory/embeddings is central.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nMemory:\n- Use `memory` for memory, embeddings, vector stores, embedding providers, or memory provider behavior.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n- Do not replace sandboxing or approvals with `security` unless the title is actually about security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n\nPolicy/config/security/MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON, for example:\n{\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they name explicit central subjects that a maintainer must see.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area.\n\nImportant corrections:\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing. Use `notifications` only when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, or notification delivery.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central. \u201coutbound session identity\u201d is `sessions`.\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, before_prompt_build, or internal execution path. Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be the user-visible subject.\n- A title like `active-memory recall subagent can deadlock on the main lane inside before_prompt_build` should be `memory` and `reliability`.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when the item is about provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support such as `amazon-bedrock-mantle`.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with a bracketed agent/source marker like `[codex]`. Treat bracketed `[codex]` as metadata unless the actual subject is Codex behavior.\n- A title like `[codex] fix webchat full-message reader for truncated history` should not include `codex`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- A title like `[codex] fix webchat full-message reader for truncated history` should be `gateway`, `api_surface`, and `ui_tui` when the reader is gateway-backed or gateway-owned.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- WhatsApp history is `chat_integrations`.\n- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.\n- Do not replace chat topics with `gateway` unless the gateway itself is the subject.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions \u201cstartup plugins\u201d unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals, and sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nPolicy/config/security/MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n- Suppress `queueing` for incidental words like lane, deadlock, main lane, worker, subagent, or internal hook names unless queue lifecycle or queued execution is the actual subject.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201csend\u201d, \u201clane\u201d, \u201cdeadlock\u201d, \u201cCodex\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, bracketed metadata, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nOutput format:\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.\n- Always return a valid object with `topics_of_interest` as an array. Use an empty array only when no allowed topic is central.\n- Never invent topic ids.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.\n\nProvider catalogs and local model providers:\n- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, or missing/unsupported model entry.\n- Provider prefixes such as `google-vertex:` are strong signals for `local_model_providers` when the title is about provider catalog behavior.\n- Model names such as Gemini variants, including `gemini-3.1-flash-lite`, are not enough for `model_serving`; classify the provider/catalog issue instead.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nReliability:\n- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.\n- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.\n- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, or LocalAI.\n- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central maintainer bucket over broad fallback labels.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Do not label every noun. Prefer the narrow central subject over broad or adjacent labels.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area, if that area is explicitly central.\n- `docs(queue): clarify steer behavior...` => `docs`, `queueing`.\n- Tutorial, guide, docs, README, or showcase catalog additions are usually `docs` only unless the title clearly names a second central product area.\n- Example: `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` => `docs`.\n- Do not add `agent_demos` just because the title mentions a tutorial, showcase, example, or educational resource. Treat that as label spam unless the actual subject is an agent demo feature, demo runtime, demo app behavior, or a maintainer-owned demo surface.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, APIs, selection, schema handling, or runtime behavior.\n- Do not add it for \u201ctool boundaries\u201d in docs about another system unless tool-call behavior is central.\n\nACP, gateway, and runtime:\n- ACP-related work routes to `acp` when ACP is named centrally.\n- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.\n- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.\n- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n\nCodex and plugins:\n- Codex-related behavior routes to `codex` when Codex is named centrally.\n- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.\n\nNotifications, chat, and cron:\n- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.\n- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.\n- Use `cron_automation` only when cron scheduling, force-run, lifecycle, execution, or deadlock is central.\n\nExec, sandboxing, and approvals:\n- Exec command/tool behavior routes to `exec_tools`.\n- Exec PATH fallback is `exec_tools`.\n- Exec v2 contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when all are central.\n- Do not replace sandboxing or approvals with `security` unless the title is about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.\n\nBrowser automation:\n- Browser diagnostics, automation layers, runtime behavior, and browser tooling route to `browser_automation`.\n- Do not add `gateway` unless gateway itself is explicitly central.\n\nMemory, inference, and model serving:\n- Memory or embeddings provider work routes to `memory`.\n- Self-hosted inference servers like llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nPolicy/config:\n- Policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when central.\n- Network policy, access restrictions, outbound rules, or boundary checks can be `security`.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d or \u201cmodel conformance\u201d to `model_serving` unless serving endpoints or protocol behavior are central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201csecurity\u201d, \u201cshowcase\u201d, \u201ctutorial\u201d, or \u201cdemo\u201d, verify that the topic is actually the subject.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, educational resources, or incidental words.\n- Keep required central second and third topics only when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nOutput only JSON."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n\nOutput format:\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.\n- Always return a valid object with `topics_of_interest` as an array.\n- Use an empty array only when no allowed topic is central.\n- Never invent topic ids.\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.\n\nSpecific routing rules:\n\nCodex:\n- Use `codex` when the title centrally concerns Codex startup, Codex OAuth, Codex runtime behavior, Codex configuration, Dockerized Codex, or Codex-specific user workflows.\n- A title with multiple Codex-related facets, such as startup behavior plus Docker Codex OAuth, usually still needs only `codex` unless another non-Codex product area is also central.\n\nChat integrations:\n- Use `chat_integrations` when the title centrally names WhatsApp, chat history, chat import/export, chat connectors, messaging platforms, or conversation integration behavior.\n- \u201cWhatsApp history\u201d is a strong signal for `chat_integrations`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n\nSkills and plugins:\n- Do not add `skills_plugins` merely because the word \u201cplugins\u201d appears.\n- Treat \u201cCodex startup plugins\u201d as part of the Codex startup surface unless the title is centrally about the plugin system, plugin APIs, plugin installation, plugin lifecycle, skill/plugin registry, or skills behavior.\n- `skills_plugins` is label spam when plugins are only one detail inside a broader Codex fix.\n\nGateway:\n- Do not add `gateway` merely because the title mentions Docker, OAuth, startup, routing, service setup, or Codex infrastructure.\n- Use `gateway` only when the gateway itself, gateway routing, gateway API behavior, or gateway service contract is the central subject.\n\nOpen-weight models:\n- Use `open_weight_models` when the title centrally concerns open-weight model families, open-weight model metadata, context windows for open-weight models, or model support for open-weight models.\n- Mantle open-weight models are a strong signal for `open_weight_models`.\n\nProvider catalogs and local model providers:\n- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, model metadata for a provider, context-window metadata in a provider catalog, or missing/unsupported provider model entry.\n- Provider prefixes/scopes such as `amazon-bedrock-mantle`, `amazon-bedrock`, `google-vertex:`, and similar provider-qualified names are strong signals for `local_model_providers` when the title is about provider catalog behavior or provider-specific model metadata.\n- Adding known context windows for models under a provider is both provider-catalog work and model-metadata work.\n- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.\n\nReliability:\n- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.\n- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.\n- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, LocalAI, context windows, or model metadata.\n- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, \u201cDocker\u201d, \u201cOAuth\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer the narrow central maintainer bucket over broad fallback labels.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.\n- Return only final JSON using exact allowed topic ids, for example:\n {\"topics_of_interest\":[\"queueing\",\"docs\"]}\n\nTask:\nChoose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nGeneral process:\n1. Read the title first.\n2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output only JSON.\n\nHigh-signal title patterns:\n- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.\n- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.\n- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.\n- Prefixes naming providers or integrations, such as `google-vertex:`, `lm-studio:`, or similar, are often central maintainer-routing signals.\n- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.\n\nAdditional domain rules learned from examples:\n\nModel provider catalogs and provider integrations:\n- Missing models, provider catalogs, model registries, provider capability lists, provider metadata, or provider-specific model availability route to `local_model_providers` when central.\n- This includes cloud or hosted provider names when the issue is about the OpenClaw provider integration/catalog, such as `google-vertex` and Gemini model catalog entries.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n- Do not classify provider catalog/model availability bugs as `model_serving` unless the central subject is serving endpoints, API protocol behavior, streaming lifecycle, endpoint compatibility, base URL behavior, or request routing.\n\nReliability:\n- Use `reliability` when the title centrally describes silent failure, missing errors, false success, hangs, deadlocks, crashes, data loss, or failure modes that make the system unreliable.\n- Add `reliability` as a secondary topic when an explicit reliability failure mode is part of the user-visible problem.\n- Do not add `reliability` to every bug fix. It must be explicit, such as \u201csilent failure instead of error.\u201d\n\nLocal models:\n- Use `local_models` when a local model product/runtime/provider is central, such as LM Studio, Ollama, llama.cpp, LocalAI, or similar local model environments.\n- If the title names LM Studio as the affected system, include `local_models` even when the API/protocol issue also routes elsewhere.\n- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n\nModel serving:\n- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible or Responses API request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or protocol-specific content handling.\n- Responses API behavior can be `model_serving` when the bug is about how served model responses are represented or consumed.\n- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n- Do not add `model_serving` merely because a title mentions a model name, provider, provider catalog, OpenAI-compatible provider, Gemini, Vertex, LM Studio, llama.cpp, Ollama, vLLM, TGI, or LocalAI. Confirm the serving/API protocol itself is central.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.\n- Do not add `tool_calling` just because the title says \u201ctool boundaries\u201d unless tool-call behavior itself is the central feature or bug.\n\nQueueing:\n- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.\n\nTool calling:\n- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nFinal suppression checks:\n- If a topic was added only because of a word like \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201cusage\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201ctool\u201d, \u201cplugin\u201d, \u201cchunk\u201d, \u201ccron\u201d, \u201cgateway\u201d, or \u201csecurity\u201d, verify that the topic is actually the subject.\n- Prefer narrow central topics over broad fallback labels.\n- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.\n- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput:\nYou may receive a target URL, title, body, summary, labels, examples, tests, files changed, or prior discussion. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.\n\nDecision process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, example, test fixture, implementation detail, internal hook, backend used to reproduce a bug, broad ownership area, or label-spam keyword.\n\nImportant example:\n- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.\n Do not add `local_models` just because `llama.cpp` appears. In this title, llama.cpp is the affected backend/example; the central issue is OpenAI-compatible streaming protocol behavior and the final usage chunk lifecycle.\n\nTopic rules:\n\nNotifications:\nUse `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\nDo not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\nUse `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\nSlack target-channel pushes and WhatsApp history are `chat_integrations`.\n\nCron:\nUse `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\nDo not add it merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\nUse `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\nExec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\nDo not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\nUse `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\nDo not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\nUse `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\nActive-memory recall deadlocks are usually `memory` plus `reliability`.\n\nGateway and sessions:\nUse `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\nUse `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n\u201cOutbound session identity\u201d is `sessions`.\nA title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` is `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\nUse `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\nUse `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\nKnown context windows for open-weight Mantle models should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\nUse `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\nDo not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\nUse `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\nUse `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\nWebchat full-message reader behavior is both `api_surface` and `ui_tui`. If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\nUse `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\nDo not add it merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\nUse `acp` when ACP is named centrally.\nUse `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\nUse `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n`[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.\n`ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nModel serving:\nUse `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\nThinking blocks, Responses API visibility, OpenAI-compatible response formatting, streaming protocol behavior, and final usage chunk handling are `model_serving`.\nDo not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\nUse `local_models` when a local model app/provider/runtime is itself central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\nUse `self_hosted_inference` when using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers is itself the feature or bug.\nSuppress both `local_models` and `self_hosted_inference` when the provider name is only an affected backend, reproduction environment, compatibility example, or implementation detail for a different central subject such as OpenAI-compatible streaming.\n\nTelemetry and usage:\nUse `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\nUse `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\nDo not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\nUse `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\nUse `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\nUse `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\nDo not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\nDocumentation-only PRs should usually include `docs` plus the central documented area.\nA `docs(queue): ...` title usually includes both `docs` and `queueing`.\nDo not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\nUse `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\nSuppress it for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\nUse `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\nMentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\nDo not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, llama.cpp, Ollama, vLLM, TGI, LocalAI, or LM Studio. Keep it only if that topic is actually a central maintainer-owned subject."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 38,
+ 64,
+ 90,
+ 124,
+ 150,
+ 180,
+ 206,
+ 232
+ ],
+ "num_full_val_evals": 9,
+ "objective_pareto_front": {
+ "weighted_score": 0.875
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ],
+ [
+ 1
+ ],
+ [
+ 0
+ ],
+ [
+ 0
+ ],
+ [
+ 2
+ ],
+ [
+ 3
+ ],
+ [
+ 0
+ ],
+ [
+ 5
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 5
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 8
+ ],
+ "1": [
+ 0,
+ 8
+ ],
+ "2": [
+ 0,
+ 1,
+ 4
+ ],
+ "3": [
+ 1,
+ 4,
+ 5,
+ 6,
+ 8
+ ],
+ "4": [
+ 0,
+ 3,
+ 4,
+ 5,
+ 6,
+ 8
+ ],
+ "5": [
+ 8,
+ 1,
+ 2,
+ 5
+ ],
+ "6": [
+ 1,
+ 2,
+ 3,
+ 5,
+ 7,
+ 8
+ ],
+ "7": [
+ 2,
+ 3,
+ 5,
+ 6,
+ 7,
+ 8
+ ],
+ "8": [
+ 8,
+ 2,
+ 5
+ ],
+ "9": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8
+ ],
+ "10": [
+ 0,
+ 8,
+ 5
+ ],
+ "11": [
+ 0,
+ 1,
+ 4,
+ 5
+ ],
+ "12": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 6,
+ 8
+ ],
+ "13": [
+ 3,
+ 6,
+ 7
+ ],
+ "14": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8
+ ],
+ "15": [
+ 2,
+ 5,
+ 6,
+ 7,
+ 8
+ ],
+ "16": [
+ 0,
+ 5
+ ],
+ "17": [
+ 1,
+ 2,
+ 5,
+ 7
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z",
+ "seed": 0,
+ "total_metric_calls": 250,
+ "val_aggregate_scores": [
+ 0.7361111111111112,
+ 0.6845238095238095,
+ 0.652336860670194,
+ 0.5527777777777777,
+ 0.6435185185185186,
+ 0.875,
+ 0.6499999999999999,
+ 0.5964285714285714,
+ 0.8055555555555556
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.7361111111111112
+ },
+ {
+ "weighted_score": 0.6845238095238095
+ },
+ {
+ "weighted_score": 0.652336860670194
+ },
+ {
+ "weighted_score": 0.5527777777777777
+ },
+ {
+ "weighted_score": 0.6435185185185185
+ },
+ {
+ "weighted_score": 0.875
+ },
+ {
+ "weighted_score": 0.6499999999999999
+ },
+ {
+ "weighted_score": 0.5964285714285714
+ },
+ {
+ "weighted_score": 0.8055555555555556
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.5,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 0.5
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.25,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.2857142857142857,
+ "17": 1.0
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.1111111111111111,
+ "3": 0.2857142857142857,
+ "4": 0.16666666666666666,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.14285714285714285,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.0,
+ "17": 1.0
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.0,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.5,
+ "7": 0.5,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.25,
+ "17": 0.5
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 1.0
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.5,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.5,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.5
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.5,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 1.0
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.5
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/gepa_state.bin b/gepa-12b-proper-from-best-20260613T055906Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..1d3dc78fc29b9eb8c090b7a7bdd2422701f2b281
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:892fe4c22d5c36a48089d0c76f9c0c2b63047d5a7f9e1b0772e014851df7e045
+size 77345
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/launcher.log b/gepa-12b-proper-from-best-20260613T055906Z/launcher.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/optimize.stderr.log b/gepa-12b-proper-from-best-20260613T055906Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/optimize.stdout.json b/gepa-12b-proper-from-best-20260613T055906Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..fd69083c1ce6c98c32ddfb4b856c76743d24e1fb
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/optimize.stdout.json
@@ -0,0 +1,1140 @@
+Iteration 0: Base program full valset score: 0.7361111111111112 over 18 / 18 examples
+Iteration 1: Selected program 0 score: 0.7361111111111112
+Iteration 1: All subsample scores perfect. Skipping.
+Iteration 1: Reflective mutation did not propose a new candidate
+Iteration 2: Selected program 0 score: 0.7361111111111112
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON in this exact shape:
+{"topics_of_interest":["topic_id_1","topic_id_2"]}
+
+Core task:
+Choose the minimum set of allowed topic ids that routes the issue or pull request to every maintainer bucket that must see it. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible change, contract, bug, feature, documentation area, or policy.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central to the title or first clear summary.
+5. Use 3 topics when the title or first clear summary names a contract or change that centrally spans three areas.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual subject.
+
+Important contract rule:
+- Do not treat `test(...)` as meaning ātests onlyā.
+- When a test PR is about landing, enforcing, or locking a behavior contract, classify the contract being tested.
+- A contract title may require every area covered by that contract, even if one area is not repeated elsewhere in the title.
+
+Exec v2 contract rule:
+- `test(exec): land exec v2 contract follow-through` must classify as:
+ `exec_tools`, `sandboxing`, `approvals`
+- Reason: the exec v2 contract centrally covers exec tool behavior, sandbox behavior, and approval behavior.
+- Do not drop `sandboxing` from exec v2 contract follow-through items.
+- Do not classify this as only `exec_tools` plus `approvals`; sandboxing is a required central facet of the contract.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec APIs, exec PATH fallback, or exec contract behavior.
+- Use `sandboxing` when sandbox permissions, sandbox constraints, sandbox modes, filesystem/network sandbox behavior, or the exec contractās sandbox behavior is central.
+- Use `approvals` when approval policy, escalation approval, approval prompts, approval routing, or the exec contractās approval behavior is central.
+- Exec v2 contract follow-through or contract enforcement can centrally include all three:
+ `exec_tools`, `sandboxing`, `approvals`.
+
+Documentation:
+- Documentation-only PRs usually include `docs` plus the central documented area.
+- `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is central.
+- `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- `cron force-run deadlock` => `cron_automation`.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+- For exec v2 contract follow-through, explicitly verify all three required buckets are present: `exec_tools`, `sandboxing`, and `approvals`.
+Iteration 2: New subsample score 3.2857142857142856 is not better than old score 3.5, skipping
+Iteration 3: Selected program 0 score: 0.7361111111111112
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON, for example:
+{"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they name explicit central subjects that a maintainer must see.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area. In particular, do not add `gateway` unless gateway behavior or ownership is explicitly the subject.
+
+Topic rules and corrections:
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- A title like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` includes `codex`.
+- Do not add `gateway` for Codex startup, OAuth, Docker, or plugin startup unless gateway is explicitly the user-visible subject.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions āstartup pluginsā unless plugin availability or user-installed plugin behavior is central.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- WhatsApp history is `chat_integrations`.
+- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.
+- Do not replace chat topics with `gateway` unless the gateway itself is the subject.
+
+Notifications:
+- Use `notifications` for announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+
+ACP, ACPX, approvals, and sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`: ACP sessions are central, permissionMode maps to approvals, and per-binding/per-agent ACP session permission behavior belongs in the ACPX workflow bucket.
+- Do not drop `acpx` when ACP session permissions are framed around per-binding or per-agent behavior.
+
+Gateway and runtime:
+- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+
+Local models:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` should be `model_serving` and `local_models`: LM Studio is the local model provider, and Responses API/thinking block visibility is model-serving protocol behavior.
+
+Self-hosted inference:
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- For memory or embeddings providers, combine with `memory` when memory/embeddings is central.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Memory:
+- Use `memory` for memory, embeddings, vector stores, embedding providers, or memory provider behavior.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 3: New subsample score 3.2857142857142856 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.6845238095238095 (coverage 18 / 18)
+Iteration 3: Val aggregate for new program: 0.6845238095238095
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.25, 11: 1.0, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.25, 16: 0.2857142857142857, 17: 1.0}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.6845238095238095}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 3: Valset pareto front aggregate score: 0.8611111111111112
+Iteration 3: Updated valset pareto front programs: {0: {0, 1}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0}, 5: {1}, 6: {1}, 7: {0, 1}, 8: {0, 1}, 9: {0, 1}, 10: {0}, 11: {0, 1}, 12: {0, 1}, 13: {0, 1}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {1}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 3: Best valset aggregate score so far: 0.7361111111111112
+Iteration 3: Best program as per aggregate score on valset: 0
+Iteration 3: Best score on valset: 0.7361111111111112
+Iteration 3: Linear pareto front program index: 0
+Iteration 3: New program candidate index: 1
+Iteration 4: Selected program 1 score: 0.6845238095238095
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON, for example:
+{"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they name explicit central subjects that a maintainer must see.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area.
+
+Important corrections:
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing. Use `notifications` only when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, or notification delivery.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central. āoutbound session identityā is `sessions`.
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, before_prompt_build, or internal execution path. Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be the user-visible subject.
+- A title like `active-memory recall subagent can deadlock on the main lane inside before_prompt_build` should be `memory` and `reliability`.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when the item is about provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support such as `amazon-bedrock-mantle`.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with a bracketed agent/source marker like `[codex]`. Treat bracketed `[codex]` as metadata unless the actual subject is Codex behavior.
+- A title like `[codex] fix webchat full-message reader for truncated history` should not include `codex`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- A title like `[codex] fix webchat full-message reader for truncated history` should be `gateway`, `api_surface`, and `ui_tui` when the reader is gateway-backed or gateway-owned.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- WhatsApp history is `chat_integrations`.
+- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.
+- Do not replace chat topics with `gateway` unless the gateway itself is the subject.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions āstartup pluginsā unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals, and sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+- Suppress `queueing` for incidental words like lane, deadlock, main lane, worker, subagent, or internal hook names unless queue lifecycle or queued execution is the actual subject.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āsendā, ālaneā, ādeadlockā, āCodexā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, bracketed metadata, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 4: New subsample score 4.0 is better than old score 1.25. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.652336860670194 (coverage 18 / 18)
+Iteration 4: Val aggregate for new program: 0.652336860670194
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.1111111111111111, 3: 0.2857142857142857, 4: 0.16666666666666666, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.14285714285714285, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.0, 17: 1.0}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.652336860670194}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 4: Valset pareto front aggregate score: 0.9722222222222222
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0}, 5: {1, 2}, 6: {1, 2}, 7: {2}, 8: {2}, 9: {0, 1, 2}, 10: {0}, 11: {0, 1}, 12: {0, 1, 2}, 13: {2}, 14: {0, 1, 2}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 4: Best valset aggregate score so far: 0.7361111111111112
+Iteration 4: Best program as per aggregate score on valset: 0
+Iteration 4: Best score on valset: 0.7361111111111112
+Iteration 4: Linear pareto front program index: 0
+Iteration 4: New program candidate index: 2
+Iteration 5: Selected program 0 score: 0.7361111111111112
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Output format:
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.
+- Always return a valid object with `topics_of_interest` as an array. Use an empty array only when no allowed topic is central.
+- Never invent topic ids.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.
+
+Provider catalogs and local model providers:
+- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, or missing/unsupported model entry.
+- Provider prefixes such as `google-vertex:` are strong signals for `local_model_providers` when the title is about provider catalog behavior.
+- Model names such as Gemini variants, including `gemini-3.1-flash-lite`, are not enough for `model_serving`; classify the provider/catalog issue instead.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Reliability:
+- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.
+- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.
+- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central maintainer bucket over broad fallback labels.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 5: New subsample score 2.5 is better than old score 2.2. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.5527777777777777 (coverage 18 / 18)
+Iteration 5: Val aggregate for new program: 0.5527777777777777
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.5527777777777777}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 5: Valset pareto front aggregate score: 1.0
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0, 3}, 5: {1, 2}, 6: {1, 2, 3}, 7: {2, 3}, 8: {2}, 9: {0, 1, 2, 3}, 10: {0}, 11: {0, 1}, 12: {0, 1, 2}, 13: {3}, 14: {0, 1, 2, 3}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 5: Best valset aggregate score so far: 0.7361111111111112
+Iteration 5: Best program as per aggregate score on valset: 0
+Iteration 5: Best score on valset: 0.7361111111111112
+Iteration 5: Linear pareto front program index: 0
+Iteration 5: New program candidate index: 3
+Iteration 6: Selected program 2 score: 0.652336860670194
+Iteration 6: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Input format:
+- target URL
+- title
+- sometimes body or summary
+- sometimes expected/gold topics in evaluation data
+
+Use the title as the primary signal. Use the body or summary only when the title is ambiguous.
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory, open_weight_models, local_model_providers, codex, api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime, model_serving, local_models, self_hosted_inference, telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation, config, security, mcp_tooling, tool_calling.
+
+Core rule:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern. Never add a topic just because a related word appears.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, policy, documentation change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central subjects.
+5. Use three topics only when the title or first clear summary explicitly names three central facets.
+6. Use zero topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Important suppression rules:
+- Ignore examples, tests, files changed, implementation details, incidental keywords, broad impact, and path names.
+- Do not classify from a single keyword unless that keyword is actually the subject.
+- Words like āmodelā, ānetworkā, āpolicyā, āstatusā, āusageā, ātoolā, āpluginā, ācronā, āgatewayā, āsendā, ālaneā, ādeadlockā, āCodexā, or āsecurityā require confirmation from the actual title meaning.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- A title like `Policy: add model, network, and MCP conformance checks` should be `config`, `security`, and `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `local_model_providers`, `open_weight_models`, or `model_serving` unless the item is actually about provider-specific models, open-weight model metadata, serving endpoints, streaming, endpoint lifecycle, request routing, or model-server compatibility.
+- In policy/conformance titles, āmodelā is usually part of a policy/checking surface, not a model provider topic.
+
+Provider/model topics:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Use `local_model_providers` only when provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Treat `local_model_providers` as label spam when the title is about policy, conformance, or checks rather than provider behavior.
+
+Final check:
+Before returning, remove any topic that was added only because of an incidental word. Keep only topics that name central maintainer-owned subjects.
+Iteration 6: New subsample score 2.7 is not better than old score 3.25, skipping
+Iteration 7: Selected program 0 score: 0.7361111111111112
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Do not label every noun. Prefer the narrow central subject over broad or adjacent labels.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area, if that area is explicitly central.
+- `docs(queue): clarify steer behavior...` => `docs`, `queueing`.
+- Tutorial, guide, docs, README, or showcase catalog additions are usually `docs` only unless the title clearly names a second central product area.
+- Example: `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` => `docs`.
+- Do not add `agent_demos` just because the title mentions a tutorial, showcase, example, or educational resource. Treat that as label spam unless the actual subject is an agent demo feature, demo runtime, demo app behavior, or a maintainer-owned demo surface.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, APIs, selection, schema handling, or runtime behavior.
+- Do not add it for ātool boundariesā in docs about another system unless tool-call behavior is central.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+
+Notifications, chat, and cron:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Use `cron_automation` only when cron scheduling, force-run, lifecycle, execution, or deadlock is central.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when all are central.
+- Do not replace sandboxing or approvals with `security` unless the title is about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, automation layers, runtime behavior, and browser tooling route to `browser_automation`.
+- Do not add `gateway` unless gateway itself is explicitly central.
+
+Memory, inference, and model serving:
+- Memory or embeddings provider work routes to `memory`.
+- Self-hosted inference servers like llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when central.
+- Network policy, access restrictions, outbound rules, or boundary checks can be `security`.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā or āmodel conformanceā to `model_serving` unless serving endpoints or protocol behavior are central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āsecurityā, āshowcaseā, ātutorialā, or ādemoā, verify that the topic is actually the subject.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, educational resources, or incidental words.
+- Keep required central second and third topics only when dropping them would hide the item from a maintainer who owns that area.
+Iteration 7: New subsample score 4.0 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.6435185185185186 (coverage 18 / 18)
+Iteration 7: Val aggregate for new program: 0.6435185185185186
+Iteration 7: Individual valset scores for new program: {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.25, 16: 0.25, 17: 0.5}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.6435185185185185}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 7: Valset pareto front aggregate score: 1.0
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4}, 4: {0, 3, 4}, 5: {1, 2}, 6: {1, 2, 3}, 7: {2, 3}, 8: {2}, 9: {0, 1, 2, 3, 4}, 10: {0}, 11: {0, 1, 4}, 12: {0, 1, 2, 4}, 13: {3}, 14: {0, 1, 2, 3, 4}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 7: Best valset aggregate score so far: 0.7361111111111112
+Iteration 7: Best program as per aggregate score on valset: 0
+Iteration 7: Best score on valset: 0.7361111111111112
+Iteration 7: Linear pareto front program index: 0
+Iteration 7: New program candidate index: 4
+Iteration 8: Selected program 2 score: 0.652336860670194
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Output only JSON.
+Iteration 8: New subsample score 3.0 is better than old score 2.45. Continue to full eval and add to candidate pool.
+Iteration 8: Found a better program on the valset with score 0.875.
+Iteration 8: Valset score for new program: 0.875 (coverage 18 / 18)
+Iteration 8: Val aggregate for new program: 0.875
+Iteration 8: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.875}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 8: Valset pareto front aggregate score: 1.0
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5}, 4: {0, 3, 4, 5}, 5: {1, 2, 5}, 6: {1, 2, 3, 5}, 7: {2, 3, 5}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5}, 13: {3}, 14: {0, 1, 2, 3, 4, 5}, 15: {2, 5}, 16: {0, 5}, 17: {1, 2, 5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 8: Best valset aggregate score so far: 0.875
+Iteration 8: Best program as per aggregate score on valset: 5
+Iteration 8: Best score on valset: 0.875
+Iteration 8: Linear pareto front program index: 5
+Iteration 8: New program candidate index: 5
+Iteration 9: Selected program 5 score: 0.875
+Iteration 9: All subsample scores perfect. Skipping.
+Iteration 9: Reflective mutation did not propose a new candidate
+Iteration 10: Selected program 3 score: 0.5527777777777777
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Output format:
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.
+- Always return a valid object with `topics_of_interest` as an array.
+- Use an empty array only when no allowed topic is central.
+- Never invent topic ids.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.
+
+Specific routing rules:
+
+Codex:
+- Use `codex` when the title centrally concerns Codex startup, Codex OAuth, Codex runtime behavior, Codex configuration, Dockerized Codex, or Codex-specific user workflows.
+- A title with multiple Codex-related facets, such as startup behavior plus Docker Codex OAuth, usually still needs only `codex` unless another non-Codex product area is also central.
+
+Chat integrations:
+- Use `chat_integrations` when the title centrally names WhatsApp, chat history, chat import/export, chat connectors, messaging platforms, or conversation integration behavior.
+- āWhatsApp historyā is a strong signal for `chat_integrations`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+
+Skills and plugins:
+- Do not add `skills_plugins` merely because the word āpluginsā appears.
+- Treat āCodex startup pluginsā as part of the Codex startup surface unless the title is centrally about the plugin system, plugin APIs, plugin installation, plugin lifecycle, skill/plugin registry, or skills behavior.
+- `skills_plugins` is label spam when plugins are only one detail inside a broader Codex fix.
+
+Gateway:
+- Do not add `gateway` merely because the title mentions Docker, OAuth, startup, routing, service setup, or Codex infrastructure.
+- Use `gateway` only when the gateway itself, gateway routing, gateway API behavior, or gateway service contract is the central subject.
+
+Open-weight models:
+- Use `open_weight_models` when the title centrally concerns open-weight model families, open-weight model metadata, context windows for open-weight models, or model support for open-weight models.
+- Mantle open-weight models are a strong signal for `open_weight_models`.
+
+Provider catalogs and local model providers:
+- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, model metadata for a provider, context-window metadata in a provider catalog, or missing/unsupported provider model entry.
+- Provider prefixes/scopes such as `amazon-bedrock-mantle`, `amazon-bedrock`, `google-vertex:`, and similar provider-qualified names are strong signals for `local_model_providers` when the title is about provider catalog behavior or provider-specific model metadata.
+- Adding known context windows for models under a provider is both provider-catalog work and model-metadata work.
+- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.
+
+Reliability:
+- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.
+- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.
+- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, LocalAI, context windows, or model metadata.
+- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āDockerā, āOAuthā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central maintainer bucket over broad fallback labels.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 10: New subsample score 4.0 is better than old score 1.9038461538461537. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.6499999999999999 (coverage 18 / 18)
+Iteration 10: Val aggregate for new program: 0.6499999999999999
+Iteration 10: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.2, 11: 0.5, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.6499999999999999}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 10: Valset pareto front aggregate score: 1.0
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5, 6}, 4: {0, 3, 4, 5, 6}, 5: {1, 2, 5}, 6: {1, 2, 3, 5}, 7: {2, 3, 5, 6}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6}, 13: {3, 6}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {2, 5, 6}, 16: {0, 5}, 17: {1, 2, 5}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 10: Best valset aggregate score so far: 0.875
+Iteration 10: Best program as per aggregate score on valset: 5
+Iteration 10: Best score on valset: 0.875
+Iteration 10: Linear pareto front program index: 5
+Iteration 10: New program candidate index: 6
+Iteration 11: Selected program 0 score: 0.7361111111111112
+Iteration 11: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Prefixes naming providers or integrations, such as `google-vertex:`, `lm-studio:`, or similar, are often central maintainer-routing signals.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Additional domain rules learned from examples:
+
+Model provider catalogs and provider integrations:
+- Missing models, provider catalogs, model registries, provider capability lists, provider metadata, or provider-specific model availability route to `local_model_providers` when central.
+- This includes cloud or hosted provider names when the issue is about the OpenClaw provider integration/catalog, such as `google-vertex` and Gemini model catalog entries.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+- Do not classify provider catalog/model availability bugs as `model_serving` unless the central subject is serving endpoints, API protocol behavior, streaming lifecycle, endpoint compatibility, base URL behavior, or request routing.
+
+Reliability:
+- Use `reliability` when the title centrally describes silent failure, missing errors, false success, hangs, deadlocks, crashes, data loss, or failure modes that make the system unreliable.
+- Add `reliability` as a secondary topic when an explicit reliability failure mode is part of the user-visible problem.
+- Do not add `reliability` to every bug fix. It must be explicit, such as āsilent failure instead of error.ā
+
+Local models:
+- Use `local_models` when a local model product/runtime/provider is central, such as LM Studio, Ollama, llama.cpp, LocalAI, or similar local model environments.
+- If the title names LM Studio as the affected system, include `local_models` even when the API/protocol issue also routes elsewhere.
+- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible or Responses API request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or protocol-specific content handling.
+- Responses API behavior can be `model_serving` when the bug is about how served model responses are represented or consumed.
+- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+- Do not add `model_serving` merely because a title mentions a model name, provider, provider catalog, OpenAI-compatible provider, Gemini, Vertex, LM Studio, llama.cpp, Ollama, vLLM, TGI, or LocalAI. Confirm the serving/API protocol itself is central.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 11: New subsample score 2.75 is better than old score 2.7. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.5964285714285714 (coverage 18 / 18)
+Iteration 11: Val aggregate for new program: 0.5964285714285714
+Iteration 11: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.5, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.5964285714285714}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 11: Valset pareto front aggregate score: 1.0
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5, 6}, 4: {0, 3, 4, 5, 6}, 5: {1, 2, 5}, 6: {1, 2, 3, 5, 7}, 7: {2, 3, 5, 6, 7}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5, 6, 7}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6}, 13: {3, 6, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {2, 5, 6, 7}, 16: {0, 5}, 17: {1, 2, 5, 7}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 11: Best valset aggregate score so far: 0.875
+Iteration 11: Best program as per aggregate score on valset: 5
+Iteration 11: Best score on valset: 0.875
+Iteration 11: Linear pareto front program index: 5
+Iteration 11: New program candidate index: 7
+Iteration 12: Selected program 5 score: 0.875
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input:
+You may receive a target URL, title, body, summary, labels, examples, tests, files changed, or prior discussion. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.
+
+Decision process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, example, test fixture, implementation detail, internal hook, backend used to reproduce a bug, broad ownership area, or label-spam keyword.
+
+Important example:
+- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.
+ Do not add `local_models` just because `llama.cpp` appears. In this title, llama.cpp is the affected backend/example; the central issue is OpenAI-compatible streaming protocol behavior and the final usage chunk lifecycle.
+
+Topic rules:
+
+Notifications:
+Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+
+Cron:
+Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+Do not add it merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+Active-memory recall deadlocks are usually `memory` plus `reliability`.
+
+Gateway and sessions:
+Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+āOutbound session identityā is `sessions`.
+A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` is `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+Known context windows for open-weight Mantle models should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+Do not add it merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+Use `acp` when ACP is named centrally.
+Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+`[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.
+`ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, streaming protocol behavior, and final usage chunk handling are `model_serving`.
+Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+Use `local_models` when a local model app/provider/runtime is itself central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+Use `self_hosted_inference` when using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers is itself the feature or bug.
+Suppress both `local_models` and `self_hosted_inference` when the provider name is only an affected backend, reproduction environment, compatibility example, or implementation detail for a different central subject such as OpenAI-compatible streaming.
+
+Telemetry and usage:
+Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+Documentation-only PRs should usually include `docs` plus the central documented area.
+A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+Suppress it for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, llama.cpp, Ollama, vLLM, TGI, LocalAI, or LM Studio. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 12: New subsample score 4.0 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.8055555555555556 (coverage 18 / 18)
+Iteration 12: Val aggregate for new program: 0.8055555555555556
+Iteration 12: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.8055555555555556}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 12: Valset pareto front aggregate score: 1.0
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5, 8}, 1: {0, 8}, 2: {0, 1, 4}, 3: {1, 4, 5, 6, 8}, 4: {0, 3, 4, 5, 6, 8}, 5: {8, 1, 2, 5}, 6: {1, 2, 3, 5, 7, 8}, 7: {2, 3, 5, 6, 7, 8}, 8: {8, 2, 5}, 9: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 10: {0, 8, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6, 8}, 13: {3, 6, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {2, 5, 6, 7, 8}, 16: {0, 5}, 17: {1, 2, 5, 7}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 12: Best valset aggregate score so far: 0.875
+Iteration 12: Best program as per aggregate score on valset: 5
+Iteration 12: Best score on valset: 0.875
+Iteration 12: Linear pareto front program index: 5
+Iteration 12: New program candidate index: 8
+{
+ "best_idx": 5,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md",
+ "best_score": 0.875,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_metric_calls": 240,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 18,
+ "seed": 0,
+ "seed_routing_policy_chars": 8699,
+ "seed_routing_policy_sha256": "c77e7e9a0590e1c551d5e91a1c27ce5d361f8e32b4ae36f1900acdeee9e64f5f"
+ },
+ "created_at": "2026-06-13T07:50:51.337002+00:00",
+ "num_candidates": 9,
+ "num_full_val_evals": 9,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/gepa-result.json",
+ "total_metric_calls": 250
+}
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/optimizer.pid b/gepa-12b-proper-from-best-20260613T055906Z/optimizer.pid
new file mode 100644
index 0000000000000000000000000000000000000000..b9fe7270555e73f7382d7611e4141a40353b3224
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/optimizer.pid
@@ -0,0 +1 @@
+3860243
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/run_log.json b/gepa-12b-proper-from-best-20260613T055906Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..3fed0134aeba1cc0862667db93e220d443cd5f37
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/run_log.json
@@ -0,0 +1,422 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 9,
+ 11,
+ 14,
+ 12
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 1,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 0,
+ 16,
+ 10,
+ 2
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.2857142857142857,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 2,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 3,
+ 5,
+ 17,
+ 4
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.5,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 3,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 6,
+ 7,
+ 15,
+ 8
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.25,
+ 0.5,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 2,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 4,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 1,
+ 13,
+ 13,
+ 1
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.0,
+ 0.2,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 0.25,
+ 1.0,
+ 1.0,
+ 0.25
+ ],
+ "new_program_idx": 3,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 5,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 1,
+ 17,
+ 13,
+ 0
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.2,
+ 1.0,
+ 0.5
+ ]
+ },
+ {
+ "i": 6,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 2,
+ 4,
+ 14,
+ 12
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.2857142857142857,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 4,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 7,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 7,
+ 8,
+ 16,
+ 10
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.25,
+ 0.2
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.0,
+ 1.0
+ ],
+ "new_program_idx": 5,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 8,
+ "selected_program_candidate": 5,
+ "subsample_ids": [
+ 9,
+ 5,
+ 6,
+ 11
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 9,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 3,
+ 15,
+ 15,
+ 3
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 0.5,
+ 0.15384615384615385
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 6,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 10,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 13,
+ 2,
+ 0,
+ 17
+ ],
+ "subsample_scores": [
+ 0.2,
+ 1.0,
+ 1.0,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.25,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 7,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ },
+ {
+ "i": 11,
+ "selected_program_candidate": 5,
+ "subsample_ids": [
+ 4,
+ 12,
+ 6,
+ 8
+ ],
+ "subsample_scores": [
+ 0.2857142857142857,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 8,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/run_log.txt b/gepa-12b-proper-from-best-20260613T055906Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..291d8275ef81f42d963db1c6de4b71704037e374
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/run_log.txt
@@ -0,0 +1,1111 @@
+Iteration 0: Base program full valset score: 0.7361111111111112 over 18 / 18 examples
+Iteration 1: Selected program 0 score: 0.7361111111111112
+Iteration 1: All subsample scores perfect. Skipping.
+Iteration 1: Reflective mutation did not propose a new candidate
+Iteration 2: Selected program 0 score: 0.7361111111111112
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON in this exact shape:
+{"topics_of_interest":["topic_id_1","topic_id_2"]}
+
+Core task:
+Choose the minimum set of allowed topic ids that routes the issue or pull request to every maintainer bucket that must see it. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible change, contract, bug, feature, documentation area, or policy.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central to the title or first clear summary.
+5. Use 3 topics when the title or first clear summary names a contract or change that centrally spans three areas.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual subject.
+
+Important contract rule:
+- Do not treat `test(...)` as meaning ātests onlyā.
+- When a test PR is about landing, enforcing, or locking a behavior contract, classify the contract being tested.
+- A contract title may require every area covered by that contract, even if one area is not repeated elsewhere in the title.
+
+Exec v2 contract rule:
+- `test(exec): land exec v2 contract follow-through` must classify as:
+ `exec_tools`, `sandboxing`, `approvals`
+- Reason: the exec v2 contract centrally covers exec tool behavior, sandbox behavior, and approval behavior.
+- Do not drop `sandboxing` from exec v2 contract follow-through items.
+- Do not classify this as only `exec_tools` plus `approvals`; sandboxing is a required central facet of the contract.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec APIs, exec PATH fallback, or exec contract behavior.
+- Use `sandboxing` when sandbox permissions, sandbox constraints, sandbox modes, filesystem/network sandbox behavior, or the exec contractās sandbox behavior is central.
+- Use `approvals` when approval policy, escalation approval, approval prompts, approval routing, or the exec contractās approval behavior is central.
+- Exec v2 contract follow-through or contract enforcement can centrally include all three:
+ `exec_tools`, `sandboxing`, `approvals`.
+
+Documentation:
+- Documentation-only PRs usually include `docs` plus the central documented area.
+- `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+- `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+- Do not drop `skills_plugins` when plugins are the requested feature.
+
+Notifications and chat integrations:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is central.
+- `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- `cron force-run deadlock` => `cron_automation`.
+
+Browser automation:
+- Browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues route to `browser_automation`.
+- `layered browser diagnostics` => `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless the gateway itself is explicitly the subject.
+
+Memory and inference:
+- Memory or embeddings provider work routes to `memory` when the provider exists for memory/embeddings.
+- Self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`.
+- Do not add `model_serving` merely because the title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- `Policy: add model, network, and MCP conformance checks` => `mcp_tooling`, `config`, `security`.
+
+Composite fixes:
+- If a title lists several independent fixes, classify each central fix up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Do not substitute a broad infrastructure topic like `gateway` unless it is explicitly one of the listed user-visible subjects.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+- For exec v2 contract follow-through, explicitly verify all three required buckets are present: `exec_tools`, `sandboxing`, and `approvals`.
+Iteration 2: New subsample score 3.2857142857142856 is not better than old score 3.5, skipping
+Iteration 3: Selected program 0 score: 0.7361111111111112
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON, for example:
+{"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they name explicit central subjects that a maintainer must see.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area. In particular, do not add `gateway` unless gateway behavior or ownership is explicitly the subject.
+
+Topic rules and corrections:
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- A title like `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` includes `codex`.
+- Do not add `gateway` for Codex startup, OAuth, Docker, or plugin startup unless gateway is explicitly the user-visible subject.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions āstartup pluginsā unless plugin availability or user-installed plugin behavior is central.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- WhatsApp history is `chat_integrations`.
+- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.
+- Do not replace chat topics with `gateway` unless the gateway itself is the subject.
+
+Notifications:
+- Use `notifications` for announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`; cron is context unless scheduling, force-run behavior, cron lifecycle, or cron execution is itself broken.
+
+ACP, ACPX, approvals, and sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`: ACP sessions are central, permissionMode maps to approvals, and per-binding/per-agent ACP session permission behavior belongs in the ACPX workflow bucket.
+- Do not drop `acpx` when ACP session permissions are framed around per-binding or per-agent behavior.
+
+Gateway and runtime:
+- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+
+Local models:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` should be `model_serving` and `local_models`: LM Studio is the local model provider, and Responses API/thinking block visibility is model-serving protocol behavior.
+
+Self-hosted inference:
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- For memory or embeddings providers, combine with `memory` when memory/embeddings is central.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Memory:
+- Use `memory` for memory, embeddings, vector stores, embedding providers, or memory provider behavior.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+- Do not replace sandboxing or approvals with `security` unless the title is actually about security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 3: New subsample score 3.2857142857142856 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.6845238095238095 (coverage 18 / 18)
+Iteration 3: Val aggregate for new program: 0.6845238095238095
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.25, 11: 1.0, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.25, 16: 0.2857142857142857, 17: 1.0}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.6845238095238095}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.25, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 3: Valset pareto front aggregate score: 0.8611111111111112
+Iteration 3: Updated valset pareto front programs: {0: {0, 1}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0}, 5: {1}, 6: {1}, 7: {0, 1}, 8: {0, 1}, 9: {0, 1}, 10: {0}, 11: {0, 1}, 12: {0, 1}, 13: {0, 1}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {1}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 3: Best valset aggregate score so far: 0.7361111111111112
+Iteration 3: Best program as per aggregate score on valset: 0
+Iteration 3: Best score on valset: 0.7361111111111112
+Iteration 3: Linear pareto front program index: 0
+Iteration 3: New program candidate index: 1
+Iteration 4: Selected program 1 score: 0.6845238095238095
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON, for example:
+{"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they name explicit central subjects that a maintainer must see.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the subject, not a path, symptom, implementation detail, example, or broad ownership area.
+
+Important corrections:
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing. Use `notifications` only when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, or notification delivery.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central. āoutbound session identityā is `sessions`.
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, before_prompt_build, or internal execution path. Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be the user-visible subject.
+- A title like `active-memory recall subagent can deadlock on the main lane inside before_prompt_build` should be `memory` and `reliability`.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when the item is about provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support such as `amazon-bedrock-mantle`.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with a bracketed agent/source marker like `[codex]`. Treat bracketed `[codex]` as metadata unless the actual subject is Codex behavior.
+- A title like `[codex] fix webchat full-message reader for truncated history` should not include `codex`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- A title like `[codex] fix webchat full-message reader for truncated history` should be `gateway`, `api_surface`, and `ui_tui` when the reader is gateway-backed or gateway-owned.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- WhatsApp history is `chat_integrations`.
+- In a composite title, preserve chat topics when a chat platform or chat history is one of the listed fixes.
+- Do not replace chat topics with `gateway` unless the gateway itself is the subject.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions āstartup pluginsā unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals, and sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- A title like `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `gateway` only when gateway-owned behavior, gateway routing, or gateway ownership is explicitly the subject.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when the contract covers sandbox and approval behavior.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+- Suppress `queueing` for incidental words like lane, deadlock, main lane, worker, subagent, or internal hook names unless queue lifecycle or queued execution is the actual subject.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āsendā, ālaneā, ādeadlockā, āCodexā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, bracketed metadata, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 4: New subsample score 4.0 is better than old score 1.25. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.652336860670194 (coverage 18 / 18)
+Iteration 4: Val aggregate for new program: 0.652336860670194
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.1111111111111111, 3: 0.2857142857142857, 4: 0.16666666666666666, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.14285714285714285, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.0, 17: 1.0}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.652336860670194}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 4: Valset pareto front aggregate score: 0.9722222222222222
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0}, 5: {1, 2}, 6: {1, 2}, 7: {2}, 8: {2}, 9: {0, 1, 2}, 10: {0}, 11: {0, 1}, 12: {0, 1, 2}, 13: {2}, 14: {0, 1, 2}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 4: Best valset aggregate score so far: 0.7361111111111112
+Iteration 4: Best program as per aggregate score on valset: 0
+Iteration 4: Best score on valset: 0.7361111111111112
+Iteration 4: Linear pareto front program index: 0
+Iteration 4: New program candidate index: 2
+Iteration 5: Selected program 0 score: 0.7361111111111112
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Output format:
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.
+- Always return a valid object with `topics_of_interest` as an array. Use an empty array only when no allowed topic is central.
+- Never invent topic ids.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.
+
+Provider catalogs and local model providers:
+- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, or missing/unsupported model entry.
+- Provider prefixes such as `google-vertex:` are strong signals for `local_model_providers` when the title is about provider catalog behavior.
+- Model names such as Gemini variants, including `gemini-3.1-flash-lite`, are not enough for `model_serving`; classify the provider/catalog issue instead.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Reliability:
+- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.
+- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.
+- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, or LocalAI.
+- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central maintainer bucket over broad fallback labels.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 5: New subsample score 2.5 is better than old score 2.2. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.5527777777777777 (coverage 18 / 18)
+Iteration 5: Val aggregate for new program: 0.5527777777777777
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 1.0, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.5527777777777777}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 5: Valset pareto front aggregate score: 1.0
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0}, 2: {0, 1}, 3: {1}, 4: {0, 3}, 5: {1, 2}, 6: {1, 2, 3}, 7: {2, 3}, 8: {2}, 9: {0, 1, 2, 3}, 10: {0}, 11: {0, 1}, 12: {0, 1, 2}, 13: {3}, 14: {0, 1, 2, 3}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 5: Best valset aggregate score so far: 0.7361111111111112
+Iteration 5: Best program as per aggregate score on valset: 0
+Iteration 5: Best score on valset: 0.7361111111111112
+Iteration 5: Linear pareto front program index: 0
+Iteration 5: New program candidate index: 3
+Iteration 6: Selected program 2 score: 0.652336860670194
+Iteration 6: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Input format:
+- target URL
+- title
+- sometimes body or summary
+- sometimes expected/gold topics in evaluation data
+
+Use the title as the primary signal. Use the body or summary only when the title is ambiguous.
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory, open_weight_models, local_model_providers, codex, api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime, model_serving, local_models, self_hosted_inference, telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation, config, security, mcp_tooling, tool_calling.
+
+Core rule:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern. Never add a topic just because a related word appears.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, policy, documentation change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central subjects.
+5. Use three topics only when the title or first clear summary explicitly names three central facets.
+6. Use zero topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+Important suppression rules:
+- Ignore examples, tests, files changed, implementation details, incidental keywords, broad impact, and path names.
+- Do not classify from a single keyword unless that keyword is actually the subject.
+- Words like āmodelā, ānetworkā, āpolicyā, āstatusā, āusageā, ātoolā, āpluginā, ācronā, āgatewayā, āsendā, ālaneā, ādeadlockā, āCodexā, or āsecurityā require confirmation from the actual title meaning.
+
+Policy/config/security/MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- A title like `Policy: add model, network, and MCP conformance checks` should be `config`, `security`, and `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `local_model_providers`, `open_weight_models`, or `model_serving` unless the item is actually about provider-specific models, open-weight model metadata, serving endpoints, streaming, endpoint lifecycle, request routing, or model-server compatibility.
+- In policy/conformance titles, āmodelā is usually part of a policy/checking surface, not a model provider topic.
+
+Provider/model topics:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Use `local_model_providers` only when provider-specific local/open-weight model integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Treat `local_model_providers` as label spam when the title is about policy, conformance, or checks rather than provider behavior.
+
+Final check:
+Before returning, remove any topic that was added only because of an incidental word. Keep only topics that name central maintainer-owned subjects.
+Iteration 6: New subsample score 2.7 is not better than old score 3.25, skipping
+Iteration 7: Selected program 0 score: 0.7361111111111112
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Do not label every noun. Prefer the narrow central subject over broad or adjacent labels.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area, if that area is explicitly central.
+- `docs(queue): clarify steer behavior...` => `docs`, `queueing`.
+- Tutorial, guide, docs, README, or showcase catalog additions are usually `docs` only unless the title clearly names a second central product area.
+- Example: `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` => `docs`.
+- Do not add `agent_demos` just because the title mentions a tutorial, showcase, example, or educational resource. Treat that as label spam unless the actual subject is an agent demo feature, demo runtime, demo app behavior, or a maintainer-owned demo surface.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, APIs, selection, schema handling, or runtime behavior.
+- Do not add it for ātool boundariesā in docs about another system unless tool-call behavior is central.
+
+ACP, gateway, and runtime:
+- ACP-related work routes to `acp` when ACP is named centrally.
+- ACPX sandbox or workflow issues route to `acpx` when ACPX is named centrally.
+- Gateway-owned behavior routes to `gateway` only when gateway is explicitly the owner or subject.
+- Runtime work routes to `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+
+Codex and plugins:
+- Codex-related behavior routes to `codex` when Codex is named centrally.
+- User-installed plugins, plugin inheritance, Superpowers, skills, plugin discovery, plugin installation, or skill/plugin availability route to `skills_plugins`.
+
+Notifications, chat, and cron:
+- Slack, chat app delivery, chat target channels, and chat push behavior route to `chat_integrations`.
+- Announce messages, heartbeat pushes, target-channel pushes, identity overlays on pushed messages, and notification delivery route to `notifications`.
+- Use `cron_automation` only when cron scheduling, force-run, lifecycle, execution, or deadlock is central.
+
+Exec, sandboxing, and approvals:
+- Exec command/tool behavior routes to `exec_tools`.
+- Exec PATH fallback is `exec_tools`.
+- Exec v2 contract enforcement can include `exec_tools`, `sandboxing`, and `approvals` when all are central.
+- Do not replace sandboxing or approvals with `security` unless the title is about a security policy, vulnerability, network restriction, credential boundary, or allowed/blocked security behavior.
+
+Browser automation:
+- Browser diagnostics, automation layers, runtime behavior, and browser tooling route to `browser_automation`.
+- Do not add `gateway` unless gateway itself is explicitly central.
+
+Memory, inference, and model serving:
+- Memory or embeddings provider work routes to `memory`.
+- Self-hosted inference servers like llama.cpp, Ollama, vLLM, TGI, and LocalAI route to `self_hosted_inference` when the item is about using those servers as inference providers.
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Policy/config:
+- Policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when central.
+- Network policy, access restrictions, outbound rules, or boundary checks can be `security`.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Do not map āmodelā in āmodel policyā or āmodel conformanceā to `model_serving` unless serving endpoints or protocol behavior are central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āsecurityā, āshowcaseā, ātutorialā, or ādemoā, verify that the topic is actually the subject.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, educational resources, or incidental words.
+- Keep required central second and third topics only when dropping them would hide the item from a maintainer who owns that area.
+Iteration 7: New subsample score 4.0 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.6435185185185186 (coverage 18 / 18)
+Iteration 7: Val aggregate for new program: 0.6435185185185186
+Iteration 7: Individual valset scores for new program: {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 0.25, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.25, 16: 0.25, 17: 0.5}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.6435185185185185}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.7361111111111112}
+Iteration 7: Valset pareto front aggregate score: 1.0
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4}, 4: {0, 3, 4}, 5: {1, 2}, 6: {1, 2, 3}, 7: {2, 3}, 8: {2}, 9: {0, 1, 2, 3, 4}, 10: {0}, 11: {0, 1, 4}, 12: {0, 1, 2, 4}, 13: {3}, 14: {0, 1, 2, 3, 4}, 15: {2}, 16: {0}, 17: {1, 2}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 7: Best valset aggregate score so far: 0.7361111111111112
+Iteration 7: Best program as per aggregate score on valset: 0
+Iteration 7: Best score on valset: 0.7361111111111112
+Iteration 7: Linear pareto front program index: 0
+Iteration 7: New program candidate index: 4
+Iteration 8: Selected program 2 score: 0.652336860670194
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Output only JSON.
+Iteration 8: New subsample score 3.0 is better than old score 2.45. Continue to full eval and add to candidate pool.
+Iteration 8: Found a better program on the valset with score 0.875.
+Iteration 8: Valset score for new program: 0.875 (coverage 18 / 18)
+Iteration 8: Val aggregate for new program: 0.875
+Iteration 8: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.875}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 8: Valset pareto front aggregate score: 1.0
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5}, 4: {0, 3, 4, 5}, 5: {1, 2, 5}, 6: {1, 2, 3, 5}, 7: {2, 3, 5}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5}, 13: {3}, 14: {0, 1, 2, 3, 4, 5}, 15: {2, 5}, 16: {0, 5}, 17: {1, 2, 5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 8: Best valset aggregate score so far: 0.875
+Iteration 8: Best program as per aggregate score on valset: 5
+Iteration 8: Best score on valset: 0.875
+Iteration 8: Linear pareto front program index: 5
+Iteration 8: New program candidate index: 5
+Iteration 9: Selected program 5 score: 0.875
+Iteration 9: All subsample scores perfect. Skipping.
+Iteration 9: Reflective mutation did not propose a new candidate
+Iteration 10: Selected program 3 score: 0.5527777777777777
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+
+Output format:
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+- If a structured-output tool such as `final_json` is available, you MUST call it with the JSON object. Do not merely print JSON.
+- Always return a valid object with `topics_of_interest` as an array.
+- Use an empty array only when no allowed topic is central.
+- Never invent topic ids.
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Suppress labels that come only from symptoms, examples, implementation details, tests, files changed, or incidental words.
+
+Specific routing rules:
+
+Codex:
+- Use `codex` when the title centrally concerns Codex startup, Codex OAuth, Codex runtime behavior, Codex configuration, Dockerized Codex, or Codex-specific user workflows.
+- A title with multiple Codex-related facets, such as startup behavior plus Docker Codex OAuth, usually still needs only `codex` unless another non-Codex product area is also central.
+
+Chat integrations:
+- Use `chat_integrations` when the title centrally names WhatsApp, chat history, chat import/export, chat connectors, messaging platforms, or conversation integration behavior.
+- āWhatsApp historyā is a strong signal for `chat_integrations`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+
+Skills and plugins:
+- Do not add `skills_plugins` merely because the word āpluginsā appears.
+- Treat āCodex startup pluginsā as part of the Codex startup surface unless the title is centrally about the plugin system, plugin APIs, plugin installation, plugin lifecycle, skill/plugin registry, or skills behavior.
+- `skills_plugins` is label spam when plugins are only one detail inside a broader Codex fix.
+
+Gateway:
+- Do not add `gateway` merely because the title mentions Docker, OAuth, startup, routing, service setup, or Codex infrastructure.
+- Use `gateway` only when the gateway itself, gateway routing, gateway API behavior, or gateway service contract is the central subject.
+
+Open-weight models:
+- Use `open_weight_models` when the title centrally concerns open-weight model families, open-weight model metadata, context windows for open-weight models, or model support for open-weight models.
+- Mantle open-weight models are a strong signal for `open_weight_models`.
+
+Provider catalogs and local model providers:
+- Use `local_model_providers` when the central subject is a provider catalog, provider-specific model availability, provider registry, model list, provider configuration, model metadata for a provider, context-window metadata in a provider catalog, or missing/unsupported provider model entry.
+- Provider prefixes/scopes such as `amazon-bedrock-mantle`, `amazon-bedrock`, `google-vertex:`, and similar provider-qualified names are strong signals for `local_model_providers` when the title is about provider catalog behavior or provider-specific model metadata.
+- Adding known context windows for models under a provider is both provider-catalog work and model-metadata work.
+- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.
+
+Reliability:
+- Use `reliability` when the central user-visible problem is silent failure, swallowed errors, failure instead of a clear error, unexpected no-op, crash, hang, deadlock, or robustness of failure handling.
+- If a missing provider catalog entry causes a silent failure instead of reporting an error, include `reliability`.
+- Do not add `reliability` for every bug; the title or summary must make failure handling, robustness, or degraded correctness central.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Do not use `model_serving` merely because the title mentions a model name, provider name, catalog, Gemini, Google Vertex, OpenAI-compatible provider, llama.cpp, Ollama, vLLM, TGI, LocalAI, context windows, or model metadata.
+- A provider catalog/model availability bug is `local_model_providers`, not `model_serving`, unless endpoint protocol or serving runtime behavior is explicitly central.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, āDockerā, āOAuthā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer the narrow central maintainer bucket over broad fallback labels.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 10: New subsample score 4.0 is better than old score 1.9038461538461537. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.6499999999999999 (coverage 18 / 18)
+Iteration 10: Val aggregate for new program: 0.6499999999999999
+Iteration 10: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.2, 11: 0.5, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.6499999999999999}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 10: Valset pareto front aggregate score: 1.0
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5, 6}, 4: {0, 3, 4, 5, 6}, 5: {1, 2, 5}, 6: {1, 2, 3, 5}, 7: {2, 3, 5, 6}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6}, 13: {3, 6}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {2, 5, 6}, 16: {0, 5}, 17: {1, 2, 5}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 10: Best valset aggregate score so far: 0.875
+Iteration 10: Best program as per aggregate score on valset: 5
+Iteration 10: Best score on valset: 0.875
+Iteration 10: Linear pareto front program index: 5
+Iteration 10: New program candidate index: 6
+Iteration 11: Selected program 0 score: 0.7361111111111112
+Iteration 11: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, implementation details, incidental keywords, and broad impact unless they are the actual user-visible subject.
+- Return only final JSON using exact allowed topic ids, for example:
+ {"topics_of_interest":["queueing","docs"]}
+
+Task:
+Choose the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+General process:
+1. Read the title first.
+2. Identify the main user-visible problem, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output only JSON.
+
+High-signal title patterns:
+- A Conventional Commit type like `docs(...)`, `feat(...)`, `fix(...)`, `test(...)`, or `policy(...)` can indicate the kind of change.
+- A scope inside parentheses is often central. For example, `docs(queue): ...` usually includes both `docs` and `queueing`.
+- Do not ignore `test(...)` scopes when the title is about landing or enforcing a behavior contract. The tested contract can be the central subject.
+- Prefixes naming providers or integrations, such as `google-vertex:`, `lm-studio:`, or similar, are often central maintainer-routing signals.
+- Do not blindly label every word in the title. Confirm the word names the subject, not just a path, symptom, or context.
+
+Additional domain rules learned from examples:
+
+Model provider catalogs and provider integrations:
+- Missing models, provider catalogs, model registries, provider capability lists, provider metadata, or provider-specific model availability route to `local_model_providers` when central.
+- This includes cloud or hosted provider names when the issue is about the OpenClaw provider integration/catalog, such as `google-vertex` and Gemini model catalog entries.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+- Do not classify provider catalog/model availability bugs as `model_serving` unless the central subject is serving endpoints, API protocol behavior, streaming lifecycle, endpoint compatibility, base URL behavior, or request routing.
+
+Reliability:
+- Use `reliability` when the title centrally describes silent failure, missing errors, false success, hangs, deadlocks, crashes, data loss, or failure modes that make the system unreliable.
+- Add `reliability` as a secondary topic when an explicit reliability failure mode is part of the user-visible problem.
+- Do not add `reliability` to every bug fix. It must be explicit, such as āsilent failure instead of error.ā
+
+Local models:
+- Use `local_models` when a local model product/runtime/provider is central, such as LM Studio, Ollama, llama.cpp, LocalAI, or similar local model environments.
+- If the title names LM Studio as the affected system, include `local_models` even when the API/protocol issue also routes elsewhere.
+- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+
+Model serving:
+- Use `model_serving` only when the central subject is serving endpoints, OpenAI-compatible or Responses API request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or protocol-specific content handling.
+- Responses API behavior can be `model_serving` when the bug is about how served model responses are represented or consumed.
+- Example: `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+- Do not add `model_serving` merely because a title mentions a model name, provider, provider catalog, OpenAI-compatible provider, Gemini, Vertex, LM Studio, llama.cpp, Ollama, vLLM, TGI, or LocalAI. Confirm the serving/API protocol itself is central.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- Example: `docs(queue): clarify steer behavior with partial streaming and tool boundaries` => `docs`, `queueing`.
+- Do not add `tool_calling` just because the title says ātool boundariesā unless tool-call behavior itself is the central feature or bug.
+
+Queueing:
+- Queue, queueing, queued execution, steer behavior in queues, or queue lifecycle route to `queueing` when central.
+
+Tool calling:
+- `tool_calling` is only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Final suppression checks:
+- If a topic was added only because of a word like āmodelā, āproviderā, ācatalogā, āusageā, ānetworkā, ātestā, āpolicyā, āstatusā, ātoolā, āpluginā, āchunkā, ācronā, āgatewayā, or āsecurityā, verify that the topic is actually the subject.
+- Prefer narrow central topics over broad fallback labels.
+- Remove labels that come only from symptoms, implementation details, tests, examples, files changed, or incidental words.
+- Keep required central second and third topics when dropping them would hide the item from a maintainer who owns that area.
+Iteration 11: New subsample score 2.75 is better than old score 2.7. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.5964285714285714 (coverage 18 / 18)
+Iteration 11: Val aggregate for new program: 0.5964285714285714
+Iteration 11: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.5, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.5964285714285714}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 11: Valset pareto front aggregate score: 1.0
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5}, 1: {0}, 2: {0, 1, 4}, 3: {1, 4, 5, 6}, 4: {0, 3, 4, 5, 6}, 5: {1, 2, 5}, 6: {1, 2, 3, 5, 7}, 7: {2, 3, 5, 6, 7}, 8: {2, 5}, 9: {0, 1, 2, 3, 4, 5, 6, 7}, 10: {0, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6}, 13: {3, 6, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {2, 5, 6, 7}, 16: {0, 5}, 17: {1, 2, 5, 7}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 11: Best valset aggregate score so far: 0.875
+Iteration 11: Best program as per aggregate score on valset: 5
+Iteration 11: Best score on valset: 0.875
+Iteration 11: Linear pareto front program index: 5
+Iteration 11: New program candidate index: 7
+Iteration 12: Selected program 5 score: 0.875
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input:
+You may receive a target URL, title, body, summary, labels, examples, tests, files changed, or prior discussion. The title is the primary signal. Use the first clear body summary only when the title is ambiguous.
+
+Decision process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, example, test fixture, implementation detail, internal hook, backend used to reproduce a bug, broad ownership area, or label-spam keyword.
+
+Important example:
+- `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.
+ Do not add `local_models` just because `llama.cpp` appears. In this title, llama.cpp is the affected backend/example; the central issue is OpenAI-compatible streaming protocol behavior and the final usage chunk lifecycle.
+
+Topic rules:
+
+Notifications:
+Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+
+Cron:
+Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+Do not add it merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+Active-memory recall deadlocks are usually `memory` plus `reliability`.
+
+Gateway and sessions:
+Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+āOutbound session identityā is `sessions`.
+A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` is `gateway` and `sessions`, not `notifications`.
+
+Open-weight and local model providers:
+Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+Known context windows for open-weight Mantle models should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+Do not add it merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+Use `acp` when ACP is named centrally.
+Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+`[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.
+`ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Model serving:
+Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, streaming protocol behavior, and final usage chunk handling are `model_serving`.
+Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Local models and self-hosted inference:
+Use `local_models` when a local model app/provider/runtime is itself central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+Use `self_hosted_inference` when using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers is itself the feature or bug.
+Suppress both `local_models` and `self_hosted_inference` when the provider name is only an affected backend, reproduction environment, compatibility example, or implementation detail for a different central subject such as OpenAI-compatible streaming.
+
+Telemetry and usage:
+Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+
+Browser automation:
+Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security, MCP:
+Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+Documentation-only PRs should usually include `docs` plus the central documented area.
+A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+Suppress it for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, llama.cpp, Ollama, vLLM, TGI, LocalAI, or LM Studio. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 12: New subsample score 4.0 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.8055555555555556 (coverage 18 / 18)
+Iteration 12: Val aggregate for new program: 0.8055555555555556
+Iteration 12: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.8055555555555556}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.875}
+Iteration 12: Valset pareto front aggregate score: 1.0
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 4, 5, 8}, 1: {0, 8}, 2: {0, 1, 4}, 3: {1, 4, 5, 6, 8}, 4: {0, 3, 4, 5, 6, 8}, 5: {8, 1, 2, 5}, 6: {1, 2, 3, 5, 7, 8}, 7: {2, 3, 5, 6, 7, 8}, 8: {8, 2, 5}, 9: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 10: {0, 8, 5}, 11: {0, 1, 4, 5}, 12: {0, 1, 2, 4, 5, 6, 8}, 13: {3, 6, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {2, 5, 6, 7, 8}, 16: {0, 5}, 17: {1, 2, 5, 7}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {5}}
+Iteration 12: Best valset aggregate score so far: 0.875
+Iteration 12: Best program as per aggregate score on valset: 5
+Iteration 12: Best score on valset: 0.875
+Iteration 12: Linear pareto front program index: 5
+Iteration 12: New program candidate index: 8
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/run_log_stderr.txt b/gepa-12b-proper-from-best-20260613T055906Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-proper-from-best-20260613T055906Z/summary.json b/gepa-12b-proper-from-best-20260613T055906Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..73a9d0890bfe80bae4b55fbc3e0ba1739f94cbe3
--- /dev/null
+++ b/gepa-12b-proper-from-best-20260613T055906Z/summary.json
@@ -0,0 +1,29 @@
+{
+ "best_idx": 5,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md",
+ "best_score": 0.875,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_metric_calls": 240,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 18,
+ "seed": 0,
+ "seed_routing_policy_chars": 8699,
+ "seed_routing_policy_sha256": "c77e7e9a0590e1c551d5e91a1c27ce5d361f8e32b4ae36f1900acdeee9e64f5f"
+ },
+ "created_at": "2026-06-13T07:50:51.337002+00:00",
+ "num_candidates": 9,
+ "num_full_val_evals": 9,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/gepa-result.json",
+ "total_metric_calls": 250
+}
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/command.sh b/gepa-12b-row30-prop16-continuation-20260613T214854Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..067922956076c05d8a81aefe2ba6d16f994aa1e9
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214854Z/command.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+set -euo pipefail
+cd "/home/bob/repos/localpager"
+PYTHONPATH=prompt-optimizer/src "prompt-optimizer/.venv/bin/python" -m prompt_optimizer.cli optimize \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --seed-routing-policy "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md" \
+ --row-limit 30 \
+ --reflection-minibatch-size 4 \
+ --max-candidate-proposals 16 \
+ --max-metric-calls 720 \
+ --output-dir "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214854Z"
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.pid b/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..371ecc503bd5845fb72e2b24da5abd740db8298d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.pid
@@ -0,0 +1 @@
+109292
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.stderr.log b/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e45928c44e84cf9751713b5b18761e3371fa56d8
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.stderr.log
@@ -0,0 +1 @@
+Terminated
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.stdout.json b/gepa-12b-row30-prop16-continuation-20260613T214854Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/run_log.txt b/gepa-12b-row30-prop16-continuation-20260613T214854Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/run_log_stderr.txt b/gepa-12b-row30-prop16-continuation-20260613T214854Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214854Z/score_report.html b/gepa-12b-row30-prop16-continuation-20260613T214854Z/score_report.html
new file mode 100644
index 0000000000000000000000000000000000000000..e0f793b32b2218bde1d70df9e796c4f05c6dea0b
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214854Z/score_report.html
@@ -0,0 +1,136 @@
+
+
+
+
+
+gepa-12b-row30-prop16-continuation-20260613T214854Z GEPA score report
+
+
+
+
+
Run directory: /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214854Z
+
Base score
n/a
+
Proposal attempts
0
+
Accepted full evals
0
+
Rejected proposals
0
+
Candidates
n/a
+
Best candidate
n/a
+
Metric calls
n/a
+
Validation Score Over Iterations
No iteration scores found yet.
+
Proposal Subsample Delta
No proposal deltas found yet.
+
Final Candidate Scores
Final GEPA result is not available yet.
+
Proposal Events
No proposal events found yet.
+
+
+
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.prompt.md b/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..664cd1c7aec48213fa1be4152e2b87c7334f7709
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.prompt.md
@@ -0,0 +1,206 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md b/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..b4c54767c58043c9158f5c8442211645cc79c114
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md
@@ -0,0 +1,151 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidate_tree.html b/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..446553f17468f143bd1ba1634f842a968f3ed16f
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidate_tree.html
@@ -0,0 +1,206 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidates.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..0557c42dfc20d492ab75a1f83b513d9092ee9910
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/candidates.json
@@ -0,0 +1,41 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nImportant domain rules:\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing, subagents, and runtimes:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.\n\nLocal models and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like \u201cper-binding\u201d, \u201cconfigured binding\u201d, \u201cbinding ID\u201d, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` just because a title says \u201cACP sessions\u201d if the actual change is permission policy, binding policy, or per-agent configuration. Treat \u201csessions\u201d as label spam unless session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is the user-visible subject.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name. Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n- `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`, not `queueing` or `codex`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n- Do not add broad demo/showcase labels or `tool_calling` just because docs mention tool boundaries.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nGateway and runtime:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Do not substitute broad infrastructure topics for the listed user-visible subjects.\n- In composite titles, each clause may contribute a different topic. For example, one clause can be `reliability` while another contributes `auth_identity` and `sessions`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant calibration rules:\n\nSecurity composites:\n- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.\n- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.\n- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.\n- Do not collapse a composite security title to only `security` when the title explicitly names maintainer-owned subsystems like hooks or skill scanners.\n\nToken counts and usage freshness:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when the freshness, identity, lifecycle, routing, or state of a session is central.\n- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` is `sessions` and `telemetry_usage`.\n- Do not add `ui_tui` merely because the title says \u201cshowing\u201d or \u201cdisplaying\u201d token counts. UI is label spam unless the user-facing interface itself is central.\n\nModel serving, local models, and usage chunks:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` is only `model_serving`.\n- Do not add `local_models` merely because a local runtime such as llama.cpp, Ollama, LM Studio, vLLM, TGI, or LocalAI is named. Add `local_models` only when the local model app/provider/runtime itself is the user-visible subject.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, or final usage chunks. If the issue is that the streaming protocol closes before the final usage chunk, classify it as `model_serving`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`.\n- Add the central documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add broad demo/showcase labels.\n- Do not add `tool_calling` just because docs mention tool boundaries unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, and only `acpx` if ACPX is explicitly named.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n- Do not add `ui_tui` merely because something is shown, displayed, or surfaced unless the UI itself is central.\n\nRuntime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, or hook. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nGeneral suppression:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n- Never output disallowed ids such as `packaging_deployment`.\n\nReliability:\n- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.\n- For cron/chat/session collision bugs such as `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`, include:\n `cron_automation`, `sessions`, `reliability`.\n- Do not add `reliability` for routine implementation cleanup unless the title is about making behavior more robust or fixing a failure.\n\nSecurity and CI:\n- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.\n- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:\n `security`.\n- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is not allowed.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.\n- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.\n- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:\n `acpx`, `codex`, `skills_plugins`.\n\nSkills and plugins:\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.\n- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.\n- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.\n\nACPX, ACP, and Codex:\n- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.\n- Use `acp` for ACP-specific protocol/workflow behavior.\n- Use `codex` when Codex is explicitly a central product/runtime/integration subject.\n- For an ACPX Codex feature involving user-installed plugins, include all central facets:\n `acpx`, `codex`, `skills_plugins`.\n\nSandboxing, subagents, and runtimes:\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.\n- Include `sessions` when the title describes an active chat/session takeover conflict.\n\nCron automation:\n- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.\n- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.\n\nLocal models and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nGeneral suppression:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n- Ignore words that appear only in internal function or event names, such as `sessions_yield` or `sessions_spawn`, unless the actual user-visible subject is session identity, lifecycle, routing, ownership, persistence, or active-session conflict.\n- Never output disallowed ids such as `packaging_deployment`.\n\nReliability:\n- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.\n- Do not add `reliability` merely because the title contains words like fail, failure, stale, broken, or error if the maintainer-owned subject is a protocol/workflow contract such as ACP/ACPX state visibility.\n- For `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`, classify only:\n `acpx`, `acp`.\n Do not add `sessions` for `sessions_yield`, and do not add `reliability` for the failure/stale wording.\n\nSecurity:\n- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.\n- Do not add `security` merely because the title mentions SSRF defenses, proxy bypass, NO_PROXY, network policy, or a security-sounding mechanism when the user-visible problem is compatibility with local/self-hosted inference.\n- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:\n `security`.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.\n- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.\n- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:\n `acpx`, `codex`, `skills_plugins`.\n\nSkills and plugins:\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.\n- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.\n- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.\n\nACPX, ACP, and Codex:\n- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.\n- Use `acp` for ACP-specific protocol/workflow behavior.\n- Use `codex` when Codex is explicitly a central product/runtime/integration subject.\n- For explicit ACP agent runs, embedded backend visibility, final JSON state, or ACP/ACPX agent-run contracts, prefer `acp` and/or `acpx` only unless another central subject is explicitly named.\n- For an ACPX Codex feature involving user-installed plugins, include all central facets:\n `acpx`, `codex`, `skills_plugins`.\n\nSandboxing, subagents, and runtimes:\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` and `sessions_yield` when the actual problem is subagent/runtime initialization or ACP/ACPX state handling.\n- Include `sessions` when the title describes an active chat/session takeover conflict.\n\nCron automation:\n- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.\n- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.\n\nLocal models, self-hosted inference, and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- Ollama and LM Studio are strong signals for `local_models`.\n- Use `self_hosted_inference` when the central subject is running inference or embeddings through a local/self-hosted backend, especially with Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, localhost/LAN providers, proxy bypass, NO_PROXY, or self-hosted endpoint reachability.\n- For `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify:\n `local_models`, `self_hosted_inference`.\n Do not add `model_serving` unless the title is about serving protocol behavior, endpoint compatibility, streaming lifecycle, response chunks, or request routing. Do not add `security` unless security hardening itself is the central user-visible change.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nOpen-weight models and local model providers:\n- Use `open_weight_models` when the central subject is open-weight model families, model metadata, context windows, capabilities, catalogs, or known properties for open-weight models.\n- Use `local_model_providers` when the central subject is provider integration or provider catalog/metadata for open-weight/local-model providers.\n- In this taxonomy, Amazon Bedrock Mantle open-weight model metadata, including known context windows, routes to `local_model_providers` as well as `open_weight_models`.\n- For `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`, classify:\n `open_weight_models`, `local_model_providers`.\n Do not add `model_serving`; context-window metadata for model catalog/provider support is not serving protocol behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, proxy, NO_PROXY, SSRF, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal. Do not ignore them when they name a central domain.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules learned from prior errors:\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` must be:\n `browser_automation`, `packaging_deployment`.\n The Chrome sessions, Google sign-in, and user profile attach are browser automation. The `spawn npx ENOENT` failure is a packaging/deployment/runtime environment concern and must not be dropped.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` must be:\n `memory`, `self_hosted_inference`.\n Do not drop `memory` just because the body of the title focuses on providers; the scope is a central maintainer-owned area.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like \u201cper-binding\u201d, \u201cconfigured binding\u201d, \u201cbinding ID\u201d, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` just because a title says \u201cACP sessions\u201d if the actual change is permission policy, binding policy, or per-agent configuration.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, or openai-compatible. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules:\n\nACP, ACPX, sessions, and agent runs:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.\n- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, \u201cACP sessions\u201d, or another session-shaped internal API name.\n- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` routes to `acpx`, `acp`, not `sessions`.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.\n- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.\n- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.\n- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.\n- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.\n\nSandboxing, subagents, and agent runtime:\n- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.\n- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.\n- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.\n- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.\n- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.\n- Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` routes to `coding_agents`, `sandboxing`, `agent_runtime`.\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules:\n\nCron automation, sessions, and reliability:\n- Use `cron_automation` when cron jobs, scheduled work, cron announce, automated scheduled messages, or cron delivery behavior are central.\n- Use `sessions` when the issue is about active chat sessions, session ownership, session takeover, outbound session identity, embedded sessions, session restoration, or passing session identity across components.\n- Use `reliability` when the central failure is an exception, crash, stuck state, hang, deadlock, wedge, race, timeout, or robustness failure.\n- If cron delivery triggers a session takeover exception while a user is actively chatting, include all three: `cron_automation`, `sessions`, and `reliability`.\n- `EmbeddedAttemptSessionTakeoverError` is both a session concern and a reliability failure, not just cron automation.\n\nGateway and session identity:\n- Use `gateway` when gateway send behavior, gateway denial, gateway routing, gateway guardrails, or gateway message delivery is central.\n- Use `sessions` when the title names outbound session identity, session identity propagation, message sending from a session, embedded session paths, or session restoration.\n- If a PR passes outbound session identity into message sending and surfaces gateway send denial, route to `gateway` and `sessions`.\n- Do not add `hooks` merely because message sending, gateway denial, or guarded sends might involve hook-like policy internally. Add `hooks` only when hooks, hook priority, hook execution order, hook policy, or hook behavior are the named user-visible subject.\n\nLocal models, Ollama, embeddings, proxy, and SSRF wording:\n- Use `local_models` when the title is about local model behavior, local Ollama, local embeddings, local model execution, or local model runtime failures.\n- Use `self_hosted_inference` when the item is about Ollama, llama.cpp, vLLM, TGI, LocalAI, or OpenAI-compatible/self-hosted inference providers.\n- Local Ollama embeddings are a local model plus self-hosted inference concern.\n- Do not add `memory` just because the word `embeddings` appears. Use `memory` only when memory storage, memory retrieval, memory embeddings, or a title scope like `memory/embeddings` is central.\n- Do not add `security` merely because SSRF defenses, proxy bypass, or `NO_PROXY` appear in a title. Add `security` only when the item is centrally about a vulnerability, exploitability, security hardening, token verification security, or HIGH/CRITICAL security fixes.\n- If local Ollama embeddings fail when a proxy is enabled because SSRF defenses ignore `NO_PROXY`, route to `local_models` and `self_hosted_inference`, not `memory` or `security`.\n\nACP, ACPX, sessions, and agent runs:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.\n- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, \u201cACP sessions\u201d, or another session-shaped internal API name.\n- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.\n- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.\n- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.\n- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.\n- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.\n\nSandboxing, subagents, and agent runtime:\n- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.\n- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.\n- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.\n- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.\n- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, exception, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, SSRF, NO_PROXY, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the correct maintainer bucket. Prefer the title over all other signals. Use the body only if the title is ambiguous. Do not infer topics from labels, file paths, examples, tests, or incidental implementation details.\n\nCore rules:\n- Pick one primary topic.\n- Add secondary topics only when they are explicitly central user-visible concerns.\n- Use 3 topics only when the title clearly names three central facets.\n- Use 0 topics when no allowed topic is central.\n- Never invent topic ids.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not an implementation detail, label-spam keyword, symptom, internal API name, or broad ownership area.\n\nImportant calibrations from prior mistakes:\n\n1. Gateway is often label spam.\n- Do not add `gateway` merely because a bug involves sending, embedded paths, guarded delivery, or background mechanics.\n- Use `gateway` only when gateway send behavior, gateway routing, gateway denial, gateway guardrails, or gateway message delivery is the central user-visible subject.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`; it should not include `gateway`.\n\n2. Embedded OAuth paths are auth plus sessions.\n- Use `auth_identity` for OAuth, login/auth restoration, credential identity, account identity, or token identity.\n- If OAuth/auth behavior is restored or broken on an embedded path, session path, session identity, or session restoration flow, also include `sessions`.\n- Do not replace `auth_identity` with `codex` just because the title says `openai-codex` or `Codex OAuth`.\n- Add `codex` only when Codex-specific runtime, startup, setup, or behavior is itself a central maintainer-owned subject.\n\n3. Lane wedges are reliability, not queueing by default.\n- Use `reliability` for wedges, stuck states, hangs, crashes, deadlocks, races, exceptions, timeouts, or self-healing stuck work.\n- Do not add `queueing` merely because the title mentions lanes, workers, or internal scheduling terms unless queue lifecycle or scheduling behavior is central.\n\n4. ACPX Codex plugin inheritance is not sandboxing just because \u201csandbox\u201d appears.\n- Use `acpx` when ACPX is explicitly named.\n- Use `codex` when Codex is explicitly a central environment/runtime/product concern.\n- Use `skills_plugins` when plugins, skills, plugin loading, or skill scanner behavior is central.\n- Do not add `sandboxing` merely because the title contains \u201csandbox\u201d as an environment name. Add `sandboxing` only when sandbox policy, sandbox restrictions, sandbox mode, or sandbox configuration are themselves the subject.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` should be `acpx`, `codex`, `skills_plugins`.\n\n5. ACP binding bugs can be sessions without ACPX.\n- Use `acp` when ACP is named centrally.\n- Use `sessions` when the title is about session keys, session ownership, session restoration, session identity, active chat/session routing, or persistent process/session sharing.\n- Do not automatically add `acpx` for every ACP configured binding. Add `acpx` only when ACPX is explicitly named or the title is centrally about ACPX-specific binding semantics, ACPX backend behavior, or ACPX agent-run visibility/state contracts.\n- Example: `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`, not `acpx`.\n\nTopic reminders:\n- `docs`: documentation-only PRs, tutorials, README changes, guides, examples.\n- `cron_automation`: cron jobs, scheduled work, cron delivery, scheduled automated messages.\n- `sessions`: active chat sessions, session ownership, session takeover, embedded sessions, session keys, session restoration, session identity propagation.\n- `reliability`: crash, exception, stuck state, hang, deadlock, race, timeout, wedge, robustness failure.\n- `local_models`: local model behavior, local embeddings, local Ollama, local runtime failures.\n- `self_hosted_inference`: Ollama, llama.cpp, vLLM, TGI, LocalAI, OpenAI-compatible self-hosted inference providers.\n- `memory`: memory storage/retrieval/embeddings or title scope like `memory/embeddings`.\n- `security`: vulnerabilities, exploitability, SSRF as a vulnerability, security hardening, HIGH/CRITICAL fixes.\n- `hooks`: hook priority, hook execution order, hook policy, hook behavior.\n- `mcp_tooling`: MCP protocol, MCP conformance, MCP integrations, MCP tool behavior.\n- `tool_calling`: tool-call execution, schemas, parameter coercion, runtime behavior.\n- `browser_automation`: browser automation, Chrome/profile/session reuse, browser diagnostics/runtime.\n- `packaging_deployment`: packaging, install/runtime setup, bundled executable availability, PATH/binary resolution, `spawn npx ENOENT`.\n- `open_weight_models`: open-weight model metadata, catalogs, context windows, compatibility.\n- `local_model_providers`: provider-specific local/open-weight catalogs or known provider model metadata.\n- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, streaming lifecycle, request/response compatibility.\n- `coding_agents`: subagents, spawned coding agents, delegation, multi-agent coding workflows.\n- `agent_runtime`: agent initialization, lifecycle, backend startup, process/runtime behavior.\n- `sandboxing`: sandbox mode/policy/restrictions/configuration only when central.\n\nFinal check:\nBefore outputting, remove any topic that was added only because of a keyword such as gateway, sandbox, session, binding, lane, wedge, Codex, OAuth, plugin, hook, model, provider, SSRF, cron, tool, or security unless that concept is clearly a central maintainer-owned subject in the title."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore target URL path, examples, tests, files changed, labels, and incidental implementation details unless they are the user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or use the `final_json` tool if available.\n\nImportant calibration:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when session freshness, session identity, session lifecycle, session routing, or session state is central.\n- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` must be classified as:\n {\"topics_of_interest\":[\"sessions\",\"telemetry_usage\"]}\n- In that title, `totalTokensFresh` is a session freshness/state concern, and stale token counts are a telemetry/usage concern.\n- Do not classify that title as `ui_tui`. Words like \u201cshowing\u201d, \u201cdisplaying\u201d, or \u201csurfacing\u201d are label spam unless the UI, TUI, webchat, terminal interface, or reader view is itself the central subject.\n\nSuppression rules:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a symptom, path, implementation detail, broad ownership area, or incidental UI phrasing.\n- Before outputting, remove any topic added only because of words like showing, display, usage, token, session, status, UI, model, tool, gateway, security, hook, cron, or plugin unless that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput rule:\n- If a structured-output tool named `final_json` is available, you MUST call it exactly once.\n- Call it with exactly:\n {\"topics_of_interest\":[\"topic_id\"]}\n- If no such tool is available, return only this JSON object and no prose:\n {\"topics_of_interest\":[\"topic_id\"]}\n- Never omit the final output. Never explain your reasoning. Never output markdown.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Before final output, remove any topic added only from keyword matching.\n\nImportant calibration:\n\nModel serving, local models, and local providers:\n- Use `model_serving` for serving endpoints, OpenAI-compatible APIs, Responses API behavior, request/response protocol behavior, streaming lifecycle, final usage chunks, endpoint compatibility, base URL behavior, request routing, model-server compatibility, or automatic routing of model requests.\n- Use `local_models` when a local model app/runtime/provider is itself a user-visible subject, especially compatibility or behavior with named runtimes like LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar.\n- Do not add `local_models` merely because a local runtime is incidentally mentioned, but do add it when the title is about that runtime\u2019s visible behavior or compatibility.\n- `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n- Use `local_model_providers` for provider catalogs, model catalogs, provider-specific model availability, provider metadata, provider setup, or provider catalog entries.\n- `Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` => `local_model_providers`, `model_serving`.\n\nPolicy, config, MCP, and security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, configuration-governed enforcement, or repo/project policy.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n- Do not map \u201cmodel policy\u201d or \u201cmodel conformance\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nSecurity composites:\n- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.\n- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.\n\nToken counts and usage freshness:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when freshness, identity, lifecycle, routing, or state of a session is central.\n- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.\n- Do not add `ui_tui` merely because the title says \u201cshowing\u201d or \u201cdisplaying\u201d.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage chunks; streaming protocol lifecycle remains `model_serving`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually include only `docs`.\n- Add the documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.\n\nMCP and tool calling:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or clearly ACPX-specific.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Bug]: ACP configured binding uses parent channel ID for session key` => `acp`, `sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending or delivery plumbing.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions lane, main lane, worker, subagent, or internal execution paths.\n- Active-memory recall deadlocks are usually `memory`, `reliability`.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nRuntime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n\nFinal suppression check:\nRemove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, LM Studio, catalog, or hook. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nYou MUST produce structured output. If a `final_json` tool is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, auth_identity,\npackaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nClassification process:\n1. Read the title first. The title is the primary signal.\n2. Use the first clear body summary only when the title is ambiguous.\n3. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n4. Pick one primary topic.\n5. Add secondary topics only when they are explicit central maintainer-owned subjects.\n6. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n7. Use 0 topics when no allowed topic is central.\n8. Never invent topic ids. Never output labels outside the allowed list.\n9. Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, broad impact, and label-spam keywords.\n\nHard output rule:\nCalling `final_json` is part of the answer when the tool exists. Do not finish without calling it. Do not emit prose, markdown, explanations, or an empty unstructured response.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the user-visible subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nProvider catalogs:\n- Use `local_model_providers` for provider catalogs, model-provider integrations, missing provider models, provider routing, provider-specific model availability, and provider catalog errors.\n- A missing Google Vertex/Gemini provider catalog entry is `local_model_providers`.\n- Add `reliability` when the user-visible failure mode is silent failure, hang, crash, wedge, missing error, or other robustness failure.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nLocal models and self-hosted inference:\n- Use `local_models` when the local model app/runtime/provider is the user-visible subject, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, local embeddings, or local runtime behavior.\n- Use `self_hosted_inference` when the issue concerns self-hosted inference endpoints, local inference services, local embeddings endpoints, proxying to local inference, or self-hosted model connectivity.\n- Ollama embeddings failures are usually both `local_models` and `self_hosted_inference`.\n- Do not add `security` merely because the title mentions SSRF, SSRF defenses, proxy bypasses, or NO_PROXY when the actual user-visible bug is that a local/self-hosted model service fails.\n- Do not add `config` merely because proxy environment variables or NO_PROXY are mentioned, unless configuration policy/enforcement is the central subject.\n- Example: `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`.\n\nReliability, sessions, and auth identity:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedge, timeout, self-healing of wedged execution, silent failure, or robustness failure.\n- Use `sessions` when session identity, session lifecycle, session routing, embedded session paths, persistent process identity, session state, or session-specific behavior is central.\n- Use `auth_identity` when OAuth, sign-in, authentication identity, credential identity, token identity, account identity, or restored login/auth flow is central.\n- Do not add `queueing` merely because the title says lane, main lane, worker, subagent, wedge, or internal execution path. Queueing requires queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior to be the user-visible subject.\n- Do not add `codex` merely because the title says openai-codex or contains Codex-branded OAuth. If the central subject is OAuth/auth restoration, use `auth_identity`; if it happens on an embedded/session path, add `sessions`.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.\n\nBrowser automation and packaging/deployment:\n- Use `browser_automation` for Chrome/Chromium sessions, browser automation, browser diagnostics, browser runtime behavior, browser profile attach, browser tooling, and browser sign-in flows.\n- Use `packaging_deployment` for desktop app packaging, bundled runtime issues, missing executable/path problems, spawn errors caused by missing installed tools, `npx`/Node availability in packaged apps, installer/distribution/runtime packaging, or deployment environment failures.\n- If a desktop browser automation issue also fails because a packaged or deployed binary cannot spawn `npx` or another required executable, include `packaging_deployment`.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nSecurity:\n- Use `security` for actual vulnerabilities, SSRF as the central vulnerability, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vulnerability fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- Do not use `security` when security words describe an implementation mechanism but the user-visible subject is local model connectivity, embeddings, packaging, sessions, or provider behavior.\n\nConfig:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Do not use `config` merely for proxy variables, env vars, provider settings, or setup details unless the title is about configuration behavior itself.\n\nCodex:\n- Use `codex` when Codex itself is the central user-visible product/runtime/setup subject, including Codex startup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because `openai-codex` appears as an OAuth provider name or bracketed metadata.\n\nQueueing:\n- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, before_prompt_build, internal path names, or wedge recovery.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`.\n- Add the central documented area only when that area is explicitly central in the title.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- Example: `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, Ollama, OAuth, proxy, NO_PROXY, npx, Chrome, or hook. Keep the topic only if that concept is actually a central maintainer-owned subject."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/command.sh b/gepa-12b-row30-prop16-continuation-20260613T214937Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f0cb9a8b999af8756329b819087d62e718dca904
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/command.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+set -euo pipefail
+cd "/home/bob/repos/localpager"
+PYTHONPATH=prompt-optimizer/src "prompt-optimizer/.venv/bin/python" -m prompt_optimizer.cli optimize \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --seed-routing-policy "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md" \
+ --row-limit 30 \
+ --reflection-minibatch-size 4 \
+ --max-candidate-proposals 17 \
+ --max-metric-calls 720 \
+ --output-dir "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z"
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/final_run_report.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/final_run_report.json
new file mode 100644
index 0000000000000000000000000000000000000000..e29014a8feab8a2bc56d98822fc0e6226a9193b5
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/final_run_report.json
@@ -0,0 +1,274 @@
+{
+ "result": {
+ "best_idx": 0,
+ "num_candidates": 13,
+ "num_full_val_evals": 13,
+ "total_metric_calls": 518,
+ "val_aggregate_scores": [
+ 0.673452380952381,
+ 0.5525091575091575,
+ 0.6586202686202686,
+ 0.6679761904761905,
+ 0.4812393162393162,
+ 0.608452380952381,
+ 0.5296214896214896,
+ 0.5934920634920634,
+ 0.6601587301587302,
+ 0.5403174603174603,
+ 0.4648107448107448,
+ 0.6468512627723154,
+ 0.5279059829059829
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z",
+ "run_log": {
+ "accepted_full_eval_candidates": 12,
+ "base_score": 0.673452380952381,
+ "better_valset_events": [],
+ "byte_count": 141996,
+ "line_count": 1556,
+ "proposal_attempts": 15,
+ "proposal_events": [
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.4666666666666668,
+ "has_proposed_text": true,
+ "iteration": 1,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 2.033333333333333
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.625,
+ "has_proposed_text": true,
+ "iteration": 2,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.375
+ },
+ {
+ "accepted_for_full_eval": false,
+ "delta": 0.0,
+ "has_proposed_text": true,
+ "iteration": 3,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 3.5
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.680952380952381,
+ "has_proposed_text": true,
+ "iteration": 4,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 1.819047619047619
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.9642857142857144,
+ "has_proposed_text": true,
+ "iteration": 5,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.0357142857142856
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.9294871794871795,
+ "has_proposed_text": true,
+ "iteration": 6,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 1.5705128205128205
+ },
+ {
+ "accepted_for_full_eval": false,
+ "delta": -0.25,
+ "has_proposed_text": true,
+ "iteration": 7,
+ "new_subsample_sum": 3.25,
+ "old_subsample_sum": 3.5
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.5,
+ "has_proposed_text": true,
+ "iteration": 8,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.5
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.880952380952381,
+ "has_proposed_text": true,
+ "iteration": 9,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.119047619047619
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.5595238095238093,
+ "has_proposed_text": true,
+ "iteration": 10,
+ "new_subsample_sum": 3.2857142857142856,
+ "old_subsample_sum": 1.7261904761904763
+ },
+ {
+ "accepted_for_full_eval": false,
+ "delta": -0.8681318681318682,
+ "has_proposed_text": true,
+ "iteration": 11,
+ "new_subsample_sum": 2.2857142857142856,
+ "old_subsample_sum": 3.1538461538461537
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.75,
+ "has_proposed_text": true,
+ "iteration": 12,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 1.75
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 0.04999999999999982,
+ "has_proposed_text": true,
+ "iteration": 14,
+ "new_subsample_sum": 3.25,
+ "old_subsample_sum": 3.2
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.0,
+ "has_proposed_text": true,
+ "iteration": 15,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.0
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.732142857142857,
+ "has_proposed_text": true,
+ "iteration": 17,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 0.7678571428571428
+ }
+ ],
+ "proposal_texts_started": 15,
+ "rejected_candidates": 3,
+ "selected_events": [
+ {
+ "candidate_idx": 0,
+ "iteration": 1,
+ "score": 0.673452380952381
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 2,
+ "score": 0.673452380952381
+ },
+ {
+ "candidate_idx": 1,
+ "iteration": 3,
+ "score": 0.5525091575091575
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 4,
+ "score": 0.673452380952381
+ },
+ {
+ "candidate_idx": 1,
+ "iteration": 5,
+ "score": 0.5525091575091575
+ },
+ {
+ "candidate_idx": 4,
+ "iteration": 6,
+ "score": 0.4812393162393162
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 7,
+ "score": 0.6586202686202686
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 8,
+ "score": 0.6586202686202686
+ },
+ {
+ "candidate_idx": 6,
+ "iteration": 9,
+ "score": 0.5296214896214896
+ },
+ {
+ "candidate_idx": 7,
+ "iteration": 10,
+ "score": 0.5934920634920634
+ },
+ {
+ "candidate_idx": 8,
+ "iteration": 11,
+ "score": 0.6601587301587302
+ },
+ {
+ "candidate_idx": 8,
+ "iteration": 12,
+ "score": 0.6601587301587302
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 13,
+ "score": 0.673452380952381
+ },
+ {
+ "candidate_idx": 3,
+ "iteration": 14,
+ "score": 0.6679761904761905
+ },
+ {
+ "candidate_idx": 3,
+ "iteration": 15,
+ "score": 0.6679761904761905
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 16,
+ "score": 0.673452380952381
+ },
+ {
+ "candidate_idx": 3,
+ "iteration": 17,
+ "score": 0.6679761904761905
+ }
+ ],
+ "selected_iterations": 17
+ },
+ "summary": {
+ "best_idx": 0,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md",
+ "best_score": 0.673452380952381,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 17,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 12379,
+ "seed_routing_policy_sha256": "6389ec2152a1c1c9c8e7fd5c7a3b522f95eba44351fc274647d3fdbca5490b14"
+ },
+ "created_at": "2026-06-14T02:12:36.124710+00:00",
+ "num_candidates": 13,
+ "num_full_val_evals": 13,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa-result.json",
+ "total_metric_calls": 518
+ }
+}
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2e039a452e0da5c6af3aceaa321364234206315b
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'agent_runtime'), description='This PR introduces a gateway-owned, node-backed runtime for the ACP protocol, involving core changes to gateway ownership, ACP logic, and agent execution runtime.', caveats=('The title specifically mentions both ACP and gateway-owned runtime, while the body confirms it implements a node-backed runtime infrastructure.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..c2f185e8016ccbc7a0a33f1ba213ac8b64633bb8
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T214938Z-110066.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T214938Z-110066.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T214938Z-110066\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..aed9f52694236a2aaeabeb69870adb48b39f0ffc
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'sandboxing', 'approvals'), description='This PR implements Exec V2 contract follow-through for shell command execution, including allowlist, safeBins, and policy hardening.', caveats=('The \"Exec V2\" terminology in the title and body specifically triggers the requirement to include all related contract areas (exec_tools, sandboxing, and approvals).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5879e03ffe5d93cc70ffa1f76d399a24722a6056
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'codex', 'skills_plugins'), description='The request asks for user-installed plugins to be accessible in the ACPX Codex sandbox environment, involving core features of the ACPX protocol, the Codex runtime, and the plugin system.', caveats=('The issue explicitly mentions ACPX and Codex while requesting support for user-installed plugins (Superpowers) within those specific environments.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..83f6c0b6ed577e048f94c5328607ee958f95e464
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'self_hosted_inference'), description=\"The pull request adds a new 'openai-compatible' embedding provider specifically to support self-hosted inference engines like Ollama, vLLM, and TGI within the memory system.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2bba9290fc9b5809c4312ea345c9cd8c8489cc6d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers',), description='The issue identifies a missing model in the google-vertex provider catalog causing failures in fallback logic.', caveats=('The issue reports both a missing catalog entry and a silent failure logic bug, but the root cause is tied to provider catalog definitions.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..edab2bd54c3b4136761bd4031ebe7f8b4f52c58d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs',), description=\"This is a purely documentation-focused pull request that adds an external video link and embed to the project's showcase page.\", caveats=('The PR is explicitly identified as a documentation-only change adding a tutorial to the showcase page.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..1c9b359a10d21d39495f6cd3089016ca9599196d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('open_weight_models', 'local_model_providers'), description='The PR adds a lookup table for context windows specifically for open-weight models on the Mantle provider.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..fc54d061b6ff6feb928452ec0b5420f0abb89eb4
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('chat_integrations', 'cron_automation'), description='The issue describes a bug where per-agent identity information is lost during specific notification flows, specifically those involving Slack integrations triggered by cron jobs or heartbeats.', caveats=('Title explicitly identifies two distinct paths for the bug: the \"cron --announce\" path and the \"heartbeat\" push. Both are central to identifying where the identity overlay is dropped.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..49c38dad3c0d28a00881d1a17f6e3b2c10f3b961
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_models', 'model_serving'), description=\"Fixes visibility of thinking blocks for LM Studio by handling the specific 'response.reasoning_text.done' event in the OpenAI-compatible response stream.\", caveats=('The PR addresses an issue specifically affecting LM Studio (a local model provider) within the context of the OpenAI-compatible Responses API streaming protocol.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..53b2f90eb52dc470748bf61b528873e9d6c1c209
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('security',), description=\"This PR focuses on strengthening the CI pipeline's security posture by implementing OIDC trusted publishing and stricter GitHub Action permissions to secure the supply chain.\", caveats=('Title and body clearly indicate a focus on hardening supply-chain security, OIDC authentication, and permission scoping for CI workflows.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9758185419882f4747696ddc9008a21c07699e4c
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('sandboxing', 'sessions'), description=\"A bug in 'sandbox.mode' causes failures during sub-agent initialization, affecting both session management and sandboxing infrastructure.\", caveats=('The issue involves both session lifecycle (sub-agents) and sandbox isolation logic.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..27c7d857e08ff93f73549702cd996612d81e7e3e
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'browser_automation', 'cron_automation'), description='The PR fixes issues across three distinct core systems: exec tool pathing, browser status diagnostics, and cron execution logic.', caveats=('Title explicitly mentions three distinct areas: exec path, browser diagnostics, and cron concurrency.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..bc10110b67af171f55b502776125453fefcea7fd
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('ui_tui',), description='This pull request fixes a bug where stale token counts are displayed in both the TUI and the web interface by respecting the totalTokensFresh flag.', caveats=('The PR addresses display bugs in both the TUI and Web UI components.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..da1c1984acb964245d0720af792a7a7824419a7c
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'tool_calling'), description='The PR addresses an issue where LLMs pass stringified JSON instead of objects to MCP tools, requiring logic in the materialization layer to coerce these values before execution.', caveats=('The title and body explicitly mention MCP tools and the coercion of parameters before tool calls, triggering both mcp_tooling and tool_calling as per the specific instructions for this combination.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..757cb0faed5b6b6089ecd19b0f65a92e5d3d12c4
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T220136Z-142554.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T220136Z-142554.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T220136Z-142554\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9432f3c71327b175e4e254b507ce5061d76d898a
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'security'), description='The issue describes a failure in local Ollama embedding retrieval caused by SSRF security measures within the proxy system.', caveats=(\"Included 'memory' because it involves the core functionality of embedding providers and Ollama; included 'security' due to the specific mention of SSRF protection logic being the root cause.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..38cdceb5eb2dd078559dbf25acec0f01ffcd3511
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('browser_automation',), description='The issue reports failures when connecting to existing Chrome sessions and using the EasyClaw extension, which are core components of browser automation.', caveats=('The issue involves both a browser integration failure and a packaging-specific missing binary, but the primary user-facing functionality being broken is browser automation via existing Chrome sessions.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..234a578ca84880dc87206f27983baafeb8d84a26
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'model_serving'), description='The issue requests adding a new model to a provider catalog and implementing automated multimodal routing logic.', caveats=(\"The input title matches a specific example provided in the instruction documentation for 'local_model_providers' and 'model_serving'.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..c685f6c9db29e77f149482e8d3e45e3268c8ee61
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'sessions'), description='This issue involves an ACP configuration bug where incorrect ID hashing causes multiple threads to share the same persistent session key.', caveats=('The issue describes a session-sharing bug caused by how ACP bindings hash conversation IDs, affecting unique identity across threads.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ca73bc1113c8c712cec46275df728a02b4368c2f
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('queueing', 'codex'), description=\"The PR fixes issues where the gateway's per-lane queue gets stuck (requiring self-healing/recovery) and restores OAuth for OpenAI's Codex service.\", caveats=('The issue addresses both a core queuing mechanism (lane wedge recovery) and specific Codex provider authentication.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..b1e18507b0f4d6ed837bd95c4c8e712cade7c864
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('reliability', 'sessions', 'notifications'), description='The issue involves a concurrent write error in the notification/announcement system impacting shared session states, specifically during cron job execution.', caveats=('The issue describes a concurrency error (race condition) triggered by a cron-based announcement notification affecting active user sessions.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..606b054246be5af786525ada73a2fcd0637f3da5
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'sessions'), description='The issue involves failures in the ACP backend visibility and state reconciliation after session yielding.', caveats=('Title explicitly mentions ACP and sessions_yield, pointing to core protocol and session lifecycle issues.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..976b1acf26f6e1dcfe0498a1bdf077b6088fa15d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'chat_integrations'), description='This PR fixes Codex-related infrastructure (startup plugins and OAuth) and implements WhatsApp chat history features.', caveats=('Title and body explicitly mention both \"Codex\" (specifically regarding startup plugins/OAuth) and \"WhatsApp\" (specifically regarding message history).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..3fb32f758ff8b7988cfc949a7368f625f38f5cad
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving', 'local_models'), description='The issue describes a failure to capture final usage chunks in an OpenAI-compatible stream, which is a core concern of model serving and local inference backends.', caveats=('The issue specifically addresses the handling of SSE streaming chunks for OpenAI-compatible backends like llama.cpp.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9e396749007fe85d4bc27c1acf29e1662455371f
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals'), description='This feature adds per-binding and per-agent scoped permissions for ACP sessions, addressing the lack of granular control over session-specific access levels.', caveats=('The issue specifically addresses the scoping of permissionMode across different bindings and agents, which falls directly into the acp and approvals categories as defined in the instructions.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..f3eedf5f5c4ce79fc86a58e462a8adc833b3233a
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'sessions'), description=\"The PR modifies how session identifiers are passed through the gateway's outbound message path to support agent-scoped decisions in guarded routes.\", caveats=('The issue addresses both the gateway routing logic and the underlying session identification for guarded messages.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..fdb319b8767890567d7ed60b1dd66851fafcf6b3
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'reliability'), description='Fixes a deadlock in the active-memory recall subagent by isolating it to its own lane.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..4a149f15a0ac7edfdd6df1d4d3d1c99edacc1e40
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'api_surface', 'ui_tui'), description='This PR implements a new RPC for a full-message reader to resolve truncated text issues in the webchat UI.', caveats=('The issue addresses a full-message reader specifically within the webchat interface, which is described in documentation as a combination of gateway, api_surface, and ui_tui.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9e34f0f6480fb82c38b4f2d26ed6cc3ad5d4d863
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description='This is a documentation-focused update clarifying specific behaviors of the queueing system, specifically regarding steering and streaming interactions.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa-result.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..39820bee9513810930d910617399ae3b50586c31
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa-result.json
@@ -0,0 +1,811 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 0,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nImportant domain rules:\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing, subagents, and runtimes:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.\n\nLocal models and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like \u201cper-binding\u201d, \u201cconfigured binding\u201d, \u201cbinding ID\u201d, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` just because a title says \u201cACP sessions\u201d if the actual change is permission policy, binding policy, or per-agent configuration. Treat \u201csessions\u201d as label spam unless session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is the user-visible subject.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name. Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n- `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`, not `queueing` or `codex`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n- Do not add broad demo/showcase labels or `tool_calling` just because docs mention tool boundaries.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nGateway and runtime:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Do not substitute broad infrastructure topics for the listed user-visible subjects.\n- In composite titles, each clause may contribute a different topic. For example, one clause can be `reliability` while another contributes `auth_identity` and `sessions`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant calibration rules:\n\nSecurity composites:\n- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.\n- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.\n- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.\n- Do not collapse a composite security title to only `security` when the title explicitly names maintainer-owned subsystems like hooks or skill scanners.\n\nToken counts and usage freshness:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when the freshness, identity, lifecycle, routing, or state of a session is central.\n- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` is `sessions` and `telemetry_usage`.\n- Do not add `ui_tui` merely because the title says \u201cshowing\u201d or \u201cdisplaying\u201d token counts. UI is label spam unless the user-facing interface itself is central.\n\nModel serving, local models, and usage chunks:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` is only `model_serving`.\n- Do not add `local_models` merely because a local runtime such as llama.cpp, Ollama, LM Studio, vLLM, TGI, or LocalAI is named. Add `local_models` only when the local model app/provider/runtime itself is the user-visible subject.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, or final usage chunks. If the issue is that the streaming protocol closes before the final usage chunk, classify it as `model_serving`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`.\n- Add the central documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add broad demo/showcase labels.\n- Do not add `tool_calling` just because docs mention tool boundaries unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, and only `acpx` if ACPX is explicitly named.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n- Do not add `ui_tui` merely because something is shown, displayed, or surfaced unless the UI itself is central.\n\nRuntime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, or hook. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nGeneral suppression:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n- Never output disallowed ids such as `packaging_deployment`.\n\nReliability:\n- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.\n- For cron/chat/session collision bugs such as `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`, include:\n `cron_automation`, `sessions`, `reliability`.\n- Do not add `reliability` for routine implementation cleanup unless the title is about making behavior more robust or fixing a failure.\n\nSecurity and CI:\n- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.\n- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:\n `security`.\n- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is not allowed.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.\n- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.\n- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:\n `acpx`, `codex`, `skills_plugins`.\n\nSkills and plugins:\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.\n- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.\n- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.\n\nACPX, ACP, and Codex:\n- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.\n- Use `acp` for ACP-specific protocol/workflow behavior.\n- Use `codex` when Codex is explicitly a central product/runtime/integration subject.\n- For an ACPX Codex feature involving user-installed plugins, include all central facets:\n `acpx`, `codex`, `skills_plugins`.\n\nSandboxing, subagents, and runtimes:\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.\n- Include `sessions` when the title describes an active chat/session takeover conflict.\n\nCron automation:\n- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.\n- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.\n\nLocal models and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents,\napi_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,\napprovals, agent_runtime, model_serving, local_models,\nself_hosted_inference, telemetry_usage, exec_tools, sandboxing,\nbrowser_automation, cron_automation, config, security, mcp_tooling,\ntool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n\nGeneral suppression:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n- Ignore words that appear only in internal function or event names, such as `sessions_yield` or `sessions_spawn`, unless the actual user-visible subject is session identity, lifecycle, routing, ownership, persistence, or active-session conflict.\n- Never output disallowed ids such as `packaging_deployment`.\n\nReliability:\n- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.\n- Do not add `reliability` merely because the title contains words like fail, failure, stale, broken, or error if the maintainer-owned subject is a protocol/workflow contract such as ACP/ACPX state visibility.\n- For `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`, classify only:\n `acpx`, `acp`.\n Do not add `sessions` for `sessions_yield`, and do not add `reliability` for the failure/stale wording.\n\nSecurity:\n- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.\n- Do not add `security` merely because the title mentions SSRF defenses, proxy bypass, NO_PROXY, network policy, or a security-sounding mechanism when the user-visible problem is compatibility with local/self-hosted inference.\n- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:\n `security`.\n\nExec, sandboxing, and approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.\n- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:\n `exec_tools`, `sandboxing`, `approvals`.\n- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.\n- Do not collapse exec contract work to only `exec_tools`.\n\nSandboxing:\n- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.\n- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.\n- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:\n `acpx`, `codex`, `skills_plugins`.\n\nSkills and plugins:\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.\n- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.\n- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.\n\nACPX, ACP, and Codex:\n- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.\n- Use `acp` for ACP-specific protocol/workflow behavior.\n- Use `codex` when Codex is explicitly a central product/runtime/integration subject.\n- For explicit ACP agent runs, embedded backend visibility, final JSON state, or ACP/ACPX agent-run contracts, prefer `acp` and/or `acpx` only unless another central subject is explicitly named.\n- For an ACPX Codex feature involving user-installed plugins, include all central facets:\n `acpx`, `codex`, `skills_plugins`.\n\nSandboxing, subagents, and runtimes:\n- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.\n- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.\n- A title like `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` is:\n `coding_agents`, `sandboxing`, `agent_runtime`.\n- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.\n\nSessions:\n- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.\n- Suppress `sessions` for internal names such as `sessions_spawn` and `sessions_yield` when the actual problem is subagent/runtime initialization or ACP/ACPX state handling.\n- Include `sessions` when the title describes an active chat/session takeover conflict.\n\nCron automation:\n- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.\n- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.\n\nLocal models, self-hosted inference, and model serving:\n- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.\n- Ollama and LM Studio are strong signals for `local_models`.\n- Use `self_hosted_inference` when the central subject is running inference or embeddings through a local/self-hosted backend, especially with Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, localhost/LAN providers, proxy bypass, NO_PROXY, or self-hosted endpoint reachability.\n- For `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify:\n `local_models`, `self_hosted_inference`.\n Do not add `model_serving` unless the title is about serving protocol behavior, endpoint compatibility, streaming lifecycle, response chunks, or request routing. Do not add `security` unless security hardening itself is the central user-visible change.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.\n- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:\n `model_serving`, `local_models`.\n\nOpen-weight models and local model providers:\n- Use `open_weight_models` when the central subject is open-weight model families, model metadata, context windows, capabilities, catalogs, or known properties for open-weight models.\n- Use `local_model_providers` when the central subject is provider integration or provider catalog/metadata for open-weight/local-model providers.\n- In this taxonomy, Amazon Bedrock Mantle open-weight model metadata, including known context windows, routes to `local_model_providers` as well as `open_weight_models`.\n- For `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`, classify:\n `open_weight_models`, `local_model_providers`.\n Do not add `model_serving`; context-window metadata for model catalog/provider support is not serving protocol behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, proxy, NO_PROXY, SSRF, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal. Do not ignore them when they name a central domain.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules learned from prior errors:\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` must be:\n `browser_automation`, `packaging_deployment`.\n The Chrome sessions, Google sign-in, and user profile attach are browser automation. The `spawn npx ENOENT` failure is a packaging/deployment/runtime environment concern and must not be dropped.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` must be:\n `memory`, `self_hosted_inference`.\n Do not drop `memory` just because the body of the title focuses on providers; the scope is a central maintainer-owned area.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like \u201cper-binding\u201d, \u201cconfigured binding\u201d, \u201cbinding ID\u201d, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` just because a title says \u201cACP sessions\u201d if the actual change is permission policy, binding policy, or per-agent configuration.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, or openai-compatible. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules:\n\nACP, ACPX, sessions, and agent runs:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.\n- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, \u201cACP sessions\u201d, or another session-shaped internal API name.\n- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` routes to `acpx`, `acp`, not `sessions`.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.\n- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.\n- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.\n- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.\n- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.\n\nSandboxing, subagents, and agent runtime:\n- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.\n- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.\n- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.\n- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.\n- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.\n- Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` routes to `coding_agents`, `sandboxing`, `agent_runtime`.\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.\n\nProcess:\n1. Read the full title, including any conventional commit type/scope prefix.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing rules:\n\nCron automation, sessions, and reliability:\n- Use `cron_automation` when cron jobs, scheduled work, cron announce, automated scheduled messages, or cron delivery behavior are central.\n- Use `sessions` when the issue is about active chat sessions, session ownership, session takeover, outbound session identity, embedded sessions, session restoration, or passing session identity across components.\n- Use `reliability` when the central failure is an exception, crash, stuck state, hang, deadlock, wedge, race, timeout, or robustness failure.\n- If cron delivery triggers a session takeover exception while a user is actively chatting, include all three: `cron_automation`, `sessions`, and `reliability`.\n- `EmbeddedAttemptSessionTakeoverError` is both a session concern and a reliability failure, not just cron automation.\n\nGateway and session identity:\n- Use `gateway` when gateway send behavior, gateway denial, gateway routing, gateway guardrails, or gateway message delivery is central.\n- Use `sessions` when the title names outbound session identity, session identity propagation, message sending from a session, embedded session paths, or session restoration.\n- If a PR passes outbound session identity into message sending and surfaces gateway send denial, route to `gateway` and `sessions`.\n- Do not add `hooks` merely because message sending, gateway denial, or guarded sends might involve hook-like policy internally. Add `hooks` only when hooks, hook priority, hook execution order, hook policy, or hook behavior are the named user-visible subject.\n\nLocal models, Ollama, embeddings, proxy, and SSRF wording:\n- Use `local_models` when the title is about local model behavior, local Ollama, local embeddings, local model execution, or local model runtime failures.\n- Use `self_hosted_inference` when the item is about Ollama, llama.cpp, vLLM, TGI, LocalAI, or OpenAI-compatible/self-hosted inference providers.\n- Local Ollama embeddings are a local model plus self-hosted inference concern.\n- Do not add `memory` just because the word `embeddings` appears. Use `memory` only when memory storage, memory retrieval, memory embeddings, or a title scope like `memory/embeddings` is central.\n- Do not add `security` merely because SSRF defenses, proxy bypass, or `NO_PROXY` appear in a title. Add `security` only when the item is centrally about a vulnerability, exploitability, security hardening, token verification security, or HIGH/CRITICAL security fixes.\n- If local Ollama embeddings fail when a proxy is enabled because SSRF defenses ignore `NO_PROXY`, route to `local_models` and `self_hosted_inference`, not `memory` or `security`.\n\nACP, ACPX, sessions, and agent runs:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.\n- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, \u201cACP sessions\u201d, or another session-shaped internal API name.\n- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.\n- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.\n- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.\n- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.\n- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.\n\nSandboxing, subagents, and agent runtime:\n- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.\n- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.\n- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.\n- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.\n- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.\n\nBrowser automation plus packaging/deployment:\n- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.\n- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.\n- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.\n\nMemory and self-hosted inference:\n- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.\n\nAuth and identity:\n- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.\n- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains \u201copenai-codex\u201d, \u201cCodex OAuth\u201d, or a Codex-flavored product/path name.\n- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.\n- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.\n\nReliability and queueing:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, exception, or robustness failure.\n- \u201clane wedge\u201d, \u201clane wedged\u201d, \u201cself-heal lane wedges\u201d, \u201cworker stuck\u201d, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.\n- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.\n- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, \u201copenai-compatible\u201d, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, SSRF, NO_PROXY, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput only JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment,\nhooks, coding_agents\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the correct maintainer bucket. Prefer the title over all other signals. Use the body only if the title is ambiguous. Do not infer topics from labels, file paths, examples, tests, or incidental implementation details.\n\nCore rules:\n- Pick one primary topic.\n- Add secondary topics only when they are explicitly central user-visible concerns.\n- Use 3 topics only when the title clearly names three central facets.\n- Use 0 topics when no allowed topic is central.\n- Never invent topic ids.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not an implementation detail, label-spam keyword, symptom, internal API name, or broad ownership area.\n\nImportant calibrations from prior mistakes:\n\n1. Gateway is often label spam.\n- Do not add `gateway` merely because a bug involves sending, embedded paths, guarded delivery, or background mechanics.\n- Use `gateway` only when gateway send behavior, gateway routing, gateway denial, gateway guardrails, or gateway message delivery is the central user-visible subject.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`; it should not include `gateway`.\n\n2. Embedded OAuth paths are auth plus sessions.\n- Use `auth_identity` for OAuth, login/auth restoration, credential identity, account identity, or token identity.\n- If OAuth/auth behavior is restored or broken on an embedded path, session path, session identity, or session restoration flow, also include `sessions`.\n- Do not replace `auth_identity` with `codex` just because the title says `openai-codex` or `Codex OAuth`.\n- Add `codex` only when Codex-specific runtime, startup, setup, or behavior is itself a central maintainer-owned subject.\n\n3. Lane wedges are reliability, not queueing by default.\n- Use `reliability` for wedges, stuck states, hangs, crashes, deadlocks, races, exceptions, timeouts, or self-healing stuck work.\n- Do not add `queueing` merely because the title mentions lanes, workers, or internal scheduling terms unless queue lifecycle or scheduling behavior is central.\n\n4. ACPX Codex plugin inheritance is not sandboxing just because \u201csandbox\u201d appears.\n- Use `acpx` when ACPX is explicitly named.\n- Use `codex` when Codex is explicitly a central environment/runtime/product concern.\n- Use `skills_plugins` when plugins, skills, plugin loading, or skill scanner behavior is central.\n- Do not add `sandboxing` merely because the title contains \u201csandbox\u201d as an environment name. Add `sandboxing` only when sandbox policy, sandbox restrictions, sandbox mode, or sandbox configuration are themselves the subject.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` should be `acpx`, `codex`, `skills_plugins`.\n\n5. ACP binding bugs can be sessions without ACPX.\n- Use `acp` when ACP is named centrally.\n- Use `sessions` when the title is about session keys, session ownership, session restoration, session identity, active chat/session routing, or persistent process/session sharing.\n- Do not automatically add `acpx` for every ACP configured binding. Add `acpx` only when ACPX is explicitly named or the title is centrally about ACPX-specific binding semantics, ACPX backend behavior, or ACPX agent-run visibility/state contracts.\n- Example: `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`, not `acpx`.\n\nTopic reminders:\n- `docs`: documentation-only PRs, tutorials, README changes, guides, examples.\n- `cron_automation`: cron jobs, scheduled work, cron delivery, scheduled automated messages.\n- `sessions`: active chat sessions, session ownership, session takeover, embedded sessions, session keys, session restoration, session identity propagation.\n- `reliability`: crash, exception, stuck state, hang, deadlock, race, timeout, wedge, robustness failure.\n- `local_models`: local model behavior, local embeddings, local Ollama, local runtime failures.\n- `self_hosted_inference`: Ollama, llama.cpp, vLLM, TGI, LocalAI, OpenAI-compatible self-hosted inference providers.\n- `memory`: memory storage/retrieval/embeddings or title scope like `memory/embeddings`.\n- `security`: vulnerabilities, exploitability, SSRF as a vulnerability, security hardening, HIGH/CRITICAL fixes.\n- `hooks`: hook priority, hook execution order, hook policy, hook behavior.\n- `mcp_tooling`: MCP protocol, MCP conformance, MCP integrations, MCP tool behavior.\n- `tool_calling`: tool-call execution, schemas, parameter coercion, runtime behavior.\n- `browser_automation`: browser automation, Chrome/profile/session reuse, browser diagnostics/runtime.\n- `packaging_deployment`: packaging, install/runtime setup, bundled executable availability, PATH/binary resolution, `spawn npx ENOENT`.\n- `open_weight_models`: open-weight model metadata, catalogs, context windows, compatibility.\n- `local_model_providers`: provider-specific local/open-weight catalogs or known provider model metadata.\n- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, streaming lifecycle, request/response compatibility.\n- `coding_agents`: subagents, spawned coding agents, delegation, multi-agent coding workflows.\n- `agent_runtime`: agent initialization, lifecycle, backend startup, process/runtime behavior.\n- `sandboxing`: sandbox mode/policy/restrictions/configuration only when central.\n\nFinal check:\nBefore outputting, remove any topic that was added only because of a keyword such as gateway, sandbox, session, binding, lane, wedge, Codex, OAuth, plugin, hook, model, provider, SSRF, cron, tool, or security unless that concept is clearly a central maintainer-owned subject in the title."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore target URL path, examples, tests, files changed, labels, and incidental implementation details unless they are the user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or use the `final_json` tool if available.\n\nImportant calibration:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when session freshness, session identity, session lifecycle, session routing, or session state is central.\n- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` must be classified as:\n {\"topics_of_interest\":[\"sessions\",\"telemetry_usage\"]}\n- In that title, `totalTokensFresh` is a session freshness/state concern, and stale token counts are a telemetry/usage concern.\n- Do not classify that title as `ui_tui`. Words like \u201cshowing\u201d, \u201cdisplaying\u201d, or \u201csurfacing\u201d are label spam unless the UI, TUI, webchat, terminal interface, or reader view is itself the central subject.\n\nSuppression rules:\n- Do not add a topic just because a related word appears.\n- Confirm that the word is the subject, not a symptom, path, implementation detail, broad ownership area, or incidental UI phrasing.\n- Before outputting, remove any topic added only because of words like showing, display, usage, token, session, status, UI, model, tool, gateway, security, hook, cron, or plugin unless that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput rule:\n- If a structured-output tool named `final_json` is available, you MUST call it exactly once.\n- Call it with exactly:\n {\"topics_of_interest\":[\"topic_id\"]}\n- If no such tool is available, return only this JSON object and no prose:\n {\"topics_of_interest\":[\"topic_id\"]}\n- Never omit the final output. Never explain your reasoning. Never output markdown.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Before final output, remove any topic added only from keyword matching.\n\nImportant calibration:\n\nModel serving, local models, and local providers:\n- Use `model_serving` for serving endpoints, OpenAI-compatible APIs, Responses API behavior, request/response protocol behavior, streaming lifecycle, final usage chunks, endpoint compatibility, base URL behavior, request routing, model-server compatibility, or automatic routing of model requests.\n- Use `local_models` when a local model app/runtime/provider is itself a user-visible subject, especially compatibility or behavior with named runtimes like LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar.\n- Do not add `local_models` merely because a local runtime is incidentally mentioned, but do add it when the title is about that runtime\u2019s visible behavior or compatibility.\n- `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.\n- Use `local_model_providers` for provider catalogs, model catalogs, provider-specific model availability, provider metadata, provider setup, or provider catalog entries.\n- `Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` => `local_model_providers`, `model_serving`.\n\nPolicy, config, MCP, and security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, configuration-governed enforcement, or repo/project policy.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n- Do not map \u201cmodel policy\u201d or \u201cmodel conformance\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nSecurity composites:\n- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.\n- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.\n\nToken counts and usage freshness:\n- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.\n- Use `sessions` when freshness, identity, lifecycle, routing, or state of a session is central.\n- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.\n- Do not add `ui_tui` merely because the title says \u201cshowing\u201d or \u201cdisplaying\u201d.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage chunks; streaming protocol lifecycle remains `model_serving`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually include only `docs`.\n- Add the documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.\n\nMCP and tool calling:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or clearly ACPX-specific.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Bug]: ACP configured binding uses parent channel ID for session key` => `acp`, `sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending or delivery plumbing.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions lane, main lane, worker, subagent, or internal execution paths.\n- Active-memory recall deadlocks are usually `memory`, `reliability`.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nRuntime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n\nFinal suppression check:\nRemove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, LM Studio, catalog, or hook. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nYou MUST produce structured output. If a `final_json` tool is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, auth_identity,\npackaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nClassification process:\n1. Read the title first. The title is the primary signal.\n2. Use the first clear body summary only when the title is ambiguous.\n3. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n4. Pick one primary topic.\n5. Add secondary topics only when they are explicit central maintainer-owned subjects.\n6. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n7. Use 0 topics when no allowed topic is central.\n8. Never invent topic ids. Never output labels outside the allowed list.\n9. Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, broad impact, and label-spam keywords.\n\nHard output rule:\nCalling `final_json` is part of the answer when the tool exists. Do not finish without calling it. Do not emit prose, markdown, explanations, or an empty unstructured response.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the user-visible subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nProvider catalogs:\n- Use `local_model_providers` for provider catalogs, model-provider integrations, missing provider models, provider routing, provider-specific model availability, and provider catalog errors.\n- A missing Google Vertex/Gemini provider catalog entry is `local_model_providers`.\n- Add `reliability` when the user-visible failure mode is silent failure, hang, crash, wedge, missing error, or other robustness failure.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nLocal models and self-hosted inference:\n- Use `local_models` when the local model app/runtime/provider is the user-visible subject, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, local embeddings, or local runtime behavior.\n- Use `self_hosted_inference` when the issue concerns self-hosted inference endpoints, local inference services, local embeddings endpoints, proxying to local inference, or self-hosted model connectivity.\n- Ollama embeddings failures are usually both `local_models` and `self_hosted_inference`.\n- Do not add `security` merely because the title mentions SSRF, SSRF defenses, proxy bypasses, or NO_PROXY when the actual user-visible bug is that a local/self-hosted model service fails.\n- Do not add `config` merely because proxy environment variables or NO_PROXY are mentioned, unless configuration policy/enforcement is the central subject.\n- Example: `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`.\n\nReliability, sessions, and auth identity:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedge, timeout, self-healing of wedged execution, silent failure, or robustness failure.\n- Use `sessions` when session identity, session lifecycle, session routing, embedded session paths, persistent process identity, session state, or session-specific behavior is central.\n- Use `auth_identity` when OAuth, sign-in, authentication identity, credential identity, token identity, account identity, or restored login/auth flow is central.\n- Do not add `queueing` merely because the title says lane, main lane, worker, subagent, wedge, or internal execution path. Queueing requires queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior to be the user-visible subject.\n- Do not add `codex` merely because the title says openai-codex or contains Codex-branded OAuth. If the central subject is OAuth/auth restoration, use `auth_identity`; if it happens on an embedded/session path, add `sessions`.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.\n\nBrowser automation and packaging/deployment:\n- Use `browser_automation` for Chrome/Chromium sessions, browser automation, browser diagnostics, browser runtime behavior, browser profile attach, browser tooling, and browser sign-in flows.\n- Use `packaging_deployment` for desktop app packaging, bundled runtime issues, missing executable/path problems, spawn errors caused by missing installed tools, `npx`/Node availability in packaged apps, installer/distribution/runtime packaging, or deployment environment failures.\n- If a desktop browser automation issue also fails because a packaged or deployed binary cannot spawn `npx` or another required executable, include `packaging_deployment`.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nSecurity:\n- Use `security` for actual vulnerabilities, SSRF as the central vulnerability, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vulnerability fixes.\n- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.\n- Do not use `security` when security words describe an implementation mechanism but the user-visible subject is local model connectivity, embeddings, packaging, sessions, or provider behavior.\n\nConfig:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Do not use `config` merely for proxy variables, env vars, provider settings, or setup details unless the title is about configuration behavior itself.\n\nCodex:\n- Use `codex` when Codex itself is the central user-visible product/runtime/setup subject, including Codex startup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because `openai-codex` appears as an OAuth provider name or bracketed metadata.\n\nQueueing:\n- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, before_prompt_build, internal path names, or wedge recovery.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include only `docs`.\n- Add the central documented area only when that area is explicitly central in the title.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- Example: `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, Ollama, OAuth, proxy, NO_PROXY, npx, Chrome, or hook. Keep the topic only if that concept is actually a central maintainer-owned subject."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 38,
+ 76,
+ 122,
+ 160,
+ 198,
+ 244,
+ 282,
+ 320,
+ 366,
+ 408,
+ 446,
+ 488
+ ],
+ "num_full_val_evals": 13,
+ "objective_pareto_front": {
+ "weighted_score": 0.6734523809523808
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ],
+ [
+ 0
+ ],
+ [
+ 0
+ ],
+ [
+ 1
+ ],
+ [
+ 4
+ ],
+ [
+ 2
+ ],
+ [
+ 6
+ ],
+ [
+ 7
+ ],
+ [
+ 8
+ ],
+ [
+ 3
+ ],
+ [
+ 3
+ ],
+ [
+ 3
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 0
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 11
+ ],
+ "1": [
+ 11,
+ 2,
+ 3
+ ],
+ "2": [
+ 0,
+ 11,
+ 3
+ ],
+ "3": [
+ 0,
+ 11,
+ 3,
+ 5
+ ],
+ "4": [
+ 3,
+ 6
+ ],
+ "5": [
+ 1,
+ 2,
+ 5,
+ 6,
+ 7,
+ 8
+ ],
+ "6": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 6,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12
+ ],
+ "7": [
+ 0,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 11,
+ 12
+ ],
+ "8": [
+ 0,
+ 1,
+ 3,
+ 11
+ ],
+ "9": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 6,
+ 8,
+ 10,
+ 11
+ ],
+ "10": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5
+ ],
+ "11": [
+ 0,
+ 4,
+ 5,
+ 7,
+ 8,
+ 9,
+ 11,
+ 12
+ ],
+ "12": [
+ 0,
+ 3,
+ 6,
+ 7,
+ 8,
+ 9
+ ],
+ "13": [
+ 2,
+ 5,
+ 7,
+ 8,
+ 10,
+ 12
+ ],
+ "14": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12
+ ],
+ "15": [
+ 0,
+ 5,
+ 6,
+ 7
+ ],
+ "16": [
+ 2,
+ 3
+ ],
+ "17": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 11
+ ],
+ "18": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 9,
+ 11
+ ],
+ "19": [
+ 1,
+ 4,
+ 5,
+ 7,
+ 8
+ ],
+ "20": [
+ 10
+ ],
+ "21": [
+ 0,
+ 2,
+ 3,
+ 6,
+ 7,
+ 8,
+ 12
+ ],
+ "22": [
+ 8,
+ 9,
+ 11,
+ 7
+ ],
+ "23": [
+ 8,
+ 12,
+ 5
+ ],
+ "24": [
+ 6,
+ 7,
+ 8,
+ 9,
+ 12
+ ],
+ "25": [
+ 0,
+ 2,
+ 7,
+ 8,
+ 11
+ ],
+ "26": [
+ 0,
+ 1,
+ 3,
+ 9,
+ 10,
+ 11,
+ 12
+ ],
+ "27": [
+ 2
+ ],
+ "28": [
+ 8,
+ 4,
+ 5
+ ],
+ "29": [
+ 8,
+ 5,
+ 7
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z",
+ "seed": 0,
+ "total_metric_calls": 518,
+ "val_aggregate_scores": [
+ 0.673452380952381,
+ 0.5525091575091575,
+ 0.6586202686202686,
+ 0.6679761904761905,
+ 0.4812393162393162,
+ 0.608452380952381,
+ 0.5296214896214896,
+ 0.5934920634920634,
+ 0.6601587301587302,
+ 0.5403174603174603,
+ 0.4648107448107448,
+ 0.6468512627723154,
+ 0.5279059829059829
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.6734523809523808
+ },
+ {
+ "weighted_score": 0.5525091575091575
+ },
+ {
+ "weighted_score": 0.6586202686202686
+ },
+ {
+ "weighted_score": 0.6679761904761905
+ },
+ {
+ "weighted_score": 0.48123931623931626
+ },
+ {
+ "weighted_score": 0.6084523809523809
+ },
+ {
+ "weighted_score": 0.5296214896214896
+ },
+ {
+ "weighted_score": 0.5934920634920634
+ },
+ {
+ "weighted_score": 0.6601587301587302
+ },
+ {
+ "weighted_score": 0.5403174603174603
+ },
+ {
+ "weighted_score": 0.4648107448107447
+ },
+ {
+ "weighted_score": 0.6468512627723154
+ },
+ {
+ "weighted_score": 0.5279059829059828
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 1.0,
+ "1": 0.0,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.0,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.25,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.0,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.25,
+ "12": 0.14285714285714285,
+ "13": 0.2,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.2,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.5,
+ "23": 0.15384615384615385,
+ "24": 0.5,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 0.5,
+ "29": 0.0
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 0.2857142857142857,
+ "3": 0.0,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.5,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.3333333333333333,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.15384615384615385,
+ "23": 0.25,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 0.25,
+ "27": 1.0,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.0,
+ "21": 1.0,
+ "22": 0.2857142857142857,
+ "23": 0.14285714285714285,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 0.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 0.14285714285714285,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.14285714285714285,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.3333333333333333,
+ "23": 0.25,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.14285714285714285,
+ "28": 1.0,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 0.5,
+ "1": 0.14285714285714285,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 0.5,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 0.5,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 0.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.0,
+ "23": 1.0,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 0.25,
+ "27": 0.125,
+ "28": 1.0,
+ "29": 1.0
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.15384615384615385,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 0.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.15384615384615385,
+ "17": 0.25,
+ "18": 0.2857142857142857,
+ "19": 0.3333333333333333,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.0,
+ "26": 0.25,
+ "27": 0.5,
+ "28": 0.5,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.14285714285714285,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 0.25,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.25,
+ "18": 0.25,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 1.0,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 1.0,
+ "26": 0.25,
+ "27": 0.25,
+ "28": 0.3333333333333333,
+ "29": 1.0
+ },
+ {
+ "0": 0.25,
+ "1": 0.2,
+ "2": 0.25,
+ "3": 0.5,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.0,
+ "16": 0.0,
+ "17": 0.25,
+ "18": 0.2857142857142857,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 1.0,
+ "23": 1.0,
+ "24": 1.0,
+ "25": 1.0,
+ "26": 0.25,
+ "27": 0.5,
+ "28": 1.0,
+ "29": 1.0
+ },
+ {
+ "0": 0.3333333333333333,
+ "1": 0.2,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.25,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.2,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 0.25,
+ "21": 0.5,
+ "22": 1.0,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.25,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.15384615384615385,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 0.5,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.2,
+ "17": 0.2,
+ "18": 0.2857142857142857,
+ "19": 0.2,
+ "20": 1.0,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.14285714285714285,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 0.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.5,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.25,
+ "21": 0.25,
+ "22": 1.0,
+ "23": 0.10526315789473684,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.5,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.0,
+ "3": 0.5,
+ "4": 0.15384615384615385,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.2,
+ "17": 0.25,
+ "18": 0.25,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.25,
+ "23": 1.0,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.25,
+ "28": 0.5,
+ "29": 0.25
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa_state.bin b/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..b9b0c431e349d9a78b495bfbec8ca525493e846e
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:589ec25fb9f1245eeb3d33aae04534f9326ba066ded838bc9ae8a75448275177
+size 125569
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.pid b/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..cb6fc5c5ee2347dbf6d6129a83ed72f13e74a66a
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.pid
@@ -0,0 +1 @@
+110022
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.stderr.log b/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.stdout.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e12dab8c1b3c26ebd7c6c50bf85f7cbe025df63c
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/optimize.stdout.json
@@ -0,0 +1,1586 @@
+Iteration 0: Base program full valset score: 0.673452380952381 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.673452380952381
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+Important domain rules:
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing, subagents, and runtimes:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+
+Local models and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 1: New subsample score 3.5 is better than old score 2.033333333333333. Continue to full eval and add to candidate pool.
+Iteration 1: Valset score for new program: 0.5525091575091575 (coverage 30 / 30)
+Iteration 1: Val aggregate for new program: 0.5525091575091575
+Iteration 1: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.0, 3: 0.25, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.25, 12: 0.14285714285714285, 13: 0.2, 14: 1.0, 15: 0.5, 16: 0.2, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.0}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.5525091575091575}
+Iteration 1: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.25}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 1: Valset pareto front aggregate score: 0.7510805860805861
+Iteration 1: Updated valset pareto front programs: {0: {0, 1}, 1: {1}, 2: {0}, 3: {0}, 4: {0, 1}, 5: {1}, 6: {0, 1}, 7: {0}, 8: {0, 1}, 9: {0, 1}, 10: {0, 1}, 11: {0}, 12: {0}, 13: {0}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {0, 1}, 18: {0, 1}, 19: {1}, 20: {0, 1}, 21: {0}, 22: {1}, 23: {1}, 24: {0, 1}, 25: {0}, 26: {0, 1}, 27: {1}, 28: {1}, 29: {0}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 1: Best valset aggregate score so far: 0.673452380952381
+Iteration 1: Best program as per aggregate score on valset: 0
+Iteration 1: Best score on valset: 0.673452380952381
+Iteration 1: Linear pareto front program index: 0
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 0 score: 0.673452380952381
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like āper-bindingā, āconfigured bindingā, ābinding IDā, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` just because a title says āACP sessionsā if the actual change is permission policy, binding policy, or per-agent configuration. Treat āsessionsā as label spam unless session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is the user-visible subject.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name. Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+- `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`, not `queueing` or `codex`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+- Do not add broad demo/showcase labels or `tool_calling` just because docs mention tool boundaries.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Gateway and runtime:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Do not substitute broad infrastructure topics for the listed user-visible subjects.
+- In composite titles, each clause may contribute a different topic. For example, one clause can be `reliability` while another contributes `auth_identity` and `sessions`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 2: New subsample score 4.0 is better than old score 2.375. Continue to full eval and add to candidate pool.
+Iteration 2: Valset score for new program: 0.6586202686202686 (coverage 30 / 30)
+Iteration 2: Val aggregate for new program: 0.6586202686202686
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.2857142857142857, 3: 0.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.5, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.15384615384615385, 23: 0.25, 24: 0.5, 25: 1.0, 26: 0.25, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.6586202686202686}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 2: Valset pareto front aggregate score: 0.8495238095238095
+Iteration 2: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {2}, 2: {0}, 3: {0}, 4: {0, 1, 2}, 5: {1, 2}, 6: {0, 1, 2}, 7: {0, 2}, 8: {0, 1}, 9: {0, 1, 2}, 10: {0, 1, 2}, 11: {0}, 12: {0}, 13: {2}, 14: {0, 1, 2}, 15: {0}, 16: {2}, 17: {0, 1, 2}, 18: {0, 1, 2}, 19: {1}, 20: {0, 1, 2}, 21: {0, 2}, 22: {1}, 23: {2}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1}, 27: {2}, 28: {1, 2}, 29: {0, 2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 2: Best valset aggregate score so far: 0.673452380952381
+Iteration 2: Best program as per aggregate score on valset: 0
+Iteration 2: Best score on valset: 0.673452380952381
+Iteration 2: Linear pareto front program index: 0
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 1 score: 0.5525091575091575
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+Important domain rules:
+
+Provider catalogs, local providers, and routing:
+- Use `local_model_providers` when the central subject is adding, updating, cataloging, configuring, or routing through a named model provider or provider catalog.
+- Provider catalog names such as Xiaomi, DeepSeek, LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar provider/runtime catalogs are strong signals for `local_model_providers` when the title says a model is being added to that catalog.
+- Use `model_serving` when the central subject is serving behavior, endpoint compatibility, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, request routing, automatic routing, multimodal routing, or model-server compatibility.
+- If a title combines a provider catalog/model-provider change with automatic routing, request routing, multimodal routing, Responses API behavior, streaming, or OpenAI-compatible serving behavior, include both `local_model_providers` and `model_serving`.
+- Example: `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` must be classified as `local_model_providers` and `model_serving`.
+- Do not collapse provider catalog work to only `model_serving` when the catalog/provider addition is explicitly central.
+- Do not use `local_models` merely because a model name appears. Use `local_models` only when a local model app/runtime itself is central.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing, subagents, and runtimes:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 3: New subsample score 3.5 is not better than old score 3.5, skipping
+Iteration 4: Selected program 0 score: 0.673452380952381
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important calibration rules:
+
+Security composites:
+- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.
+- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.
+- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.
+- Do not collapse a composite security title to only `security` when the title explicitly names maintainer-owned subsystems like hooks or skill scanners.
+
+Token counts and usage freshness:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when the freshness, identity, lifecycle, routing, or state of a session is central.
+- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` is `sessions` and `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā token counts. UI is label spam unless the user-facing interface itself is central.
+
+Model serving, local models, and usage chunks:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` is only `model_serving`.
+- Do not add `local_models` merely because a local runtime such as llama.cpp, Ollama, LM Studio, vLLM, TGI, or LocalAI is named. Add `local_models` only when the local model app/provider/runtime itself is the user-visible subject.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, or final usage chunks. If the issue is that the streaming protocol closes before the final usage chunk, classify it as `model_serving`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`.
+- Add the central documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention tool boundaries unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, and only `acpx` if ACPX is explicitly named.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+- Do not add `ui_tui` merely because something is shown, displayed, or surfaced unless the UI itself is central.
+
+Runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, or hook. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 3.5 is better than old score 1.819047619047619. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.6679761904761905 (coverage 30 / 30)
+Iteration 4: Val aggregate for new program: 0.6679761904761905
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.2857142857142857, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.0, 21: 1.0, 22: 0.2857142857142857, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.25}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.6679761904761905}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 4: Valset pareto front aggregate score: 0.8733333333333333
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3}, 4: {3}, 5: {1, 2}, 6: {0, 1, 2, 3}, 7: {0, 2, 3}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3}, 11: {0}, 12: {0, 3}, 13: {2}, 14: {0, 1, 2, 3}, 15: {0}, 16: {2, 3}, 17: {0, 1, 2}, 18: {0, 1, 2, 3}, 19: {1}, 20: {0, 1, 2}, 21: {0, 2, 3}, 22: {1}, 23: {2}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {1, 2, 3}, 29: {0, 2, 3}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 4: Best valset aggregate score so far: 0.673452380952381
+Iteration 4: Best program as per aggregate score on valset: 0
+Iteration 4: Best score on valset: 0.673452380952381
+Iteration 4: Linear pareto front program index: 0
+Iteration 4: New program candidate index: 3
+Iteration 5: Selected program 1 score: 0.5525091575091575
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+General suppression:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+- Never output disallowed ids such as `packaging_deployment`.
+
+Reliability:
+- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.
+- For cron/chat/session collision bugs such as `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`, include:
+ `cron_automation`, `sessions`, `reliability`.
+- Do not add `reliability` for routine implementation cleanup unless the title is about making behavior more robust or fixing a failure.
+
+Security and CI:
+- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.
+- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:
+ `security`.
+- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is not allowed.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.
+- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.
+- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:
+ `acpx`, `codex`, `skills_plugins`.
+
+Skills and plugins:
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.
+- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.
+- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.
+
+ACPX, ACP, and Codex:
+- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.
+- Use `acp` for ACP-specific protocol/workflow behavior.
+- Use `codex` when Codex is explicitly a central product/runtime/integration subject.
+- For an ACPX Codex feature involving user-installed plugins, include all central facets:
+ `acpx`, `codex`, `skills_plugins`.
+
+Sandboxing, subagents, and runtimes:
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+- Include `sessions` when the title describes an active chat/session takeover conflict.
+
+Cron automation:
+- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.
+- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.
+
+Local models and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 5: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.4812393162393162 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.4812393162393162
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.0, 10: 1.0, 11: 1.0, 12: 0.14285714285714285, 13: 0.25, 14: 1.0, 15: 0.25, 16: 0.14285714285714285, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.25, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.48123931623931626}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 0.25}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 5: Valset pareto front aggregate score: 0.89
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3}, 4: {3}, 5: {1, 2}, 6: {0, 1, 2, 3, 4}, 7: {0, 2, 3, 4}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3, 4}, 11: {0, 4}, 12: {0, 3}, 13: {2}, 14: {0, 1, 2, 3, 4}, 15: {0}, 16: {2, 3}, 17: {0, 1, 2, 4}, 18: {0, 1, 2, 3, 4}, 19: {1, 4}, 20: {0, 1, 2, 4}, 21: {0, 2, 3}, 22: {1}, 23: {2, 4}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4}, 29: {0, 2, 3}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 5: Best valset aggregate score so far: 0.673452380952381
+Iteration 5: Best program as per aggregate score on valset: 0
+Iteration 5: Best score on valset: 0.673452380952381
+Iteration 5: Linear pareto front program index: 0
+Iteration 5: New program candidate index: 4
+Iteration 6: Selected program 4 score: 0.4812393162393162
+Iteration 6: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+General suppression:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+- Ignore words that appear only in internal function or event names, such as `sessions_yield` or `sessions_spawn`, unless the actual user-visible subject is session identity, lifecycle, routing, ownership, persistence, or active-session conflict.
+- Never output disallowed ids such as `packaging_deployment`.
+
+Reliability:
+- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.
+- Do not add `reliability` merely because the title contains words like fail, failure, stale, broken, or error if the maintainer-owned subject is a protocol/workflow contract such as ACP/ACPX state visibility.
+- For `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`, classify only:
+ `acpx`, `acp`.
+ Do not add `sessions` for `sessions_yield`, and do not add `reliability` for the failure/stale wording.
+
+Security:
+- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.
+- Do not add `security` merely because the title mentions SSRF defenses, proxy bypass, NO_PROXY, network policy, or a security-sounding mechanism when the user-visible problem is compatibility with local/self-hosted inference.
+- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:
+ `security`.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.
+- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.
+- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:
+ `acpx`, `codex`, `skills_plugins`.
+
+Skills and plugins:
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.
+- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.
+- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.
+
+ACPX, ACP, and Codex:
+- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.
+- Use `acp` for ACP-specific protocol/workflow behavior.
+- Use `codex` when Codex is explicitly a central product/runtime/integration subject.
+- For explicit ACP agent runs, embedded backend visibility, final JSON state, or ACP/ACPX agent-run contracts, prefer `acp` and/or `acpx` only unless another central subject is explicitly named.
+- For an ACPX Codex feature involving user-installed plugins, include all central facets:
+ `acpx`, `codex`, `skills_plugins`.
+
+Sandboxing, subagents, and runtimes:
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` and `sessions_yield` when the actual problem is subagent/runtime initialization or ACP/ACPX state handling.
+- Include `sessions` when the title describes an active chat/session takeover conflict.
+
+Cron automation:
+- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.
+- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.
+
+Local models, self-hosted inference, and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- Ollama and LM Studio are strong signals for `local_models`.
+- Use `self_hosted_inference` when the central subject is running inference or embeddings through a local/self-hosted backend, especially with Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, localhost/LAN providers, proxy bypass, NO_PROXY, or self-hosted endpoint reachability.
+- For `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify:
+ `local_models`, `self_hosted_inference`.
+ Do not add `model_serving` unless the title is about serving protocol behavior, endpoint compatibility, streaming lifecycle, response chunks, or request routing. Do not add `security` unless security hardening itself is the central user-visible change.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Open-weight models and local model providers:
+- Use `open_weight_models` when the central subject is open-weight model families, model metadata, context windows, capabilities, catalogs, or known properties for open-weight models.
+- Use `local_model_providers` when the central subject is provider integration or provider catalog/metadata for open-weight/local-model providers.
+- In this taxonomy, Amazon Bedrock Mantle open-weight model metadata, including known context windows, routes to `local_model_providers` as well as `open_weight_models`.
+- For `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`, classify:
+ `open_weight_models`, `local_model_providers`.
+ Do not add `model_serving`; context-window metadata for model catalog/provider support is not serving protocol behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, proxy, NO_PROXY, SSRF, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 6: New subsample score 3.5 is better than old score 1.5705128205128205. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.608452380952381 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.608452380952381
+Iteration 6: Individual valset scores for new program: {0: 0.5, 1: 0.14285714285714285, 2: 0.25, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 0.5, 7: 1.0, 8: 0.25, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.0, 23: 1.0, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 1.0, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.6084523809523809}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 6: Valset pareto front aggregate score: 0.94
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3}, 5: {1, 2, 5}, 6: {0, 1, 2, 3, 4}, 7: {0, 2, 3, 4, 5}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5}, 12: {0, 3}, 13: {2, 5}, 14: {0, 1, 2, 3, 4, 5}, 15: {0, 5}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5}, 20: {0, 1, 2, 4, 5}, 21: {0, 2, 3}, 22: {1}, 23: {5}, 24: {0, 1, 2, 5}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 6: Best valset aggregate score so far: 0.673452380952381
+Iteration 6: Best program as per aggregate score on valset: 0
+Iteration 6: Best score on valset: 0.673452380952381
+Iteration 6: Linear pareto front program index: 0
+Iteration 6: New program candidate index: 5
+Iteration 7: Selected program 2 score: 0.6586202686202686
+Iteration 7: Proposed new text for routing_policy: When classifying GitHub issue/PR titles, keep the topic set minimal but do not drop a central notification-delivery concern.
+
+For titles about Slack pushes, target-channel pushes, heartbeat pushes, announce messages, cron --announce delivery, or pushed-message identity overlays:
+- Add `notifications` when the title is about notification behavior itself, including announce-path pushes, heartbeat pushes, target-channel pushes, delivery of notification messages, or identity overlays on pushed notification messages.
+- Add `chat_integrations` when Slack or another chat app/channel is a central delivery surface.
+- Do not suppress `notifications` just because Slack is present; Slack may require `chat_integrations`, but notification push behavior remains `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; use `cron_automation` only if cron scheduling, lifecycle, force-run, or cron execution is the central subject.
+- Do not add `auth_identity` merely because the title says āidentity overlayā if the issue is about the displayed/sent identity on pushed notifications rather than login, OAuth, account identity, credentials, or token restoration.
+- Ignore parenthetical notes like āreply path was fixed in #12345ā unless they describe the current central problem.
+
+Example:
+Title: `[Bug]: Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes (announce path; reply path was fixed in #38235)`
+Correct output:
+{"topics_of_interest":["notifications","chat_integrations"]}
+Iteration 7: New subsample score 3.25 is not better than old score 3.5, skipping
+Iteration 8: Selected program 2 score: 0.6586202686202686
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal. Do not ignore them when they name a central domain.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules learned from prior errors:
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` must be:
+ `browser_automation`, `packaging_deployment`.
+ The Chrome sessions, Google sign-in, and user profile attach are browser automation. The `spawn npx ENOENT` failure is a packaging/deployment/runtime environment concern and must not be dropped.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` must be:
+ `memory`, `self_hosted_inference`.
+ Do not drop `memory` just because the body of the title focuses on providers; the scope is a central maintainer-owned area.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like āper-bindingā, āconfigured bindingā, ābinding IDā, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` just because a title says āACP sessionsā if the actual change is permission policy, binding policy, or per-agent configuration.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, or openai-compatible. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 8: New subsample score 4.0 is better than old score 2.5. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.5296214896214896 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.5296214896214896
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.15384615384615385, 17: 0.25, 18: 0.2857142857142857, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.14285714285714285, 24: 1.0, 25: 0.0, 26: 0.25, 27: 0.5, 28: 0.5, 29: 0.2857142857142857}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.5296214896214896}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 8: Valset pareto front aggregate score: 0.9566666666666667
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 2, 3, 4, 5, 6}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5}, 12: {0, 3, 6}, 13: {2, 5}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {0, 5, 6}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5}, 20: {0, 1, 2, 4, 5, 6}, 21: {0, 2, 3, 6}, 22: {1, 6}, 23: {5}, 24: {6}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 8: Best valset aggregate score so far: 0.673452380952381
+Iteration 8: Best program as per aggregate score on valset: 0
+Iteration 8: Best score on valset: 0.673452380952381
+Iteration 8: Linear pareto front program index: 0
+Iteration 8: New program candidate index: 6
+Iteration 9: Selected program 6 score: 0.5296214896214896
+Iteration 9: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules:
+
+ACP, ACPX, sessions, and agent runs:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.
+- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, āACP sessionsā, or another session-shaped internal API name.
+- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` routes to `acpx`, `acp`, not `sessions`.
+
+Security, hooks, and skills/plugins:
+- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.
+- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.
+- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.
+- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.
+- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.
+
+Sandboxing, subagents, and agent runtime:
+- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.
+- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.
+- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.
+- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.
+- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.
+- Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` routes to `coding_agents`, `sandboxing`, `agent_runtime`.
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 9: New subsample score 4.0 is better than old score 2.119047619047619. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5934920634920634 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5934920634920634
+Iteration 9: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.14285714285714285, 3: 0.25, 4: 0.2857142857142857, 5: 1.0, 6: 0.25, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.25, 18: 0.25, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.14285714285714285, 24: 1.0, 25: 1.0, 26: 0.25, 27: 0.25, 28: 0.3333333333333333, 29: 1.0}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5934920634920634}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 9: Valset pareto front aggregate score: 0.9733333333333333
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 2, 3, 4, 5, 6, 7}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7}, 12: {0, 3, 6, 7}, 13: {2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5, 7}, 20: {0, 1, 2, 4, 5, 6, 7}, 21: {0, 2, 3, 6, 7}, 22: {7}, 23: {5}, 24: {6, 7}, 25: {0, 2, 7}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5, 7}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 9: Best valset aggregate score so far: 0.673452380952381
+Iteration 9: Best program as per aggregate score on valset: 0
+Iteration 9: Best score on valset: 0.673452380952381
+Iteration 9: Linear pareto front program index: 0
+Iteration 9: New program candidate index: 7
+Iteration 10: Selected program 7 score: 0.5934920634920634
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules:
+
+Cron automation, sessions, and reliability:
+- Use `cron_automation` when cron jobs, scheduled work, cron announce, automated scheduled messages, or cron delivery behavior are central.
+- Use `sessions` when the issue is about active chat sessions, session ownership, session takeover, outbound session identity, embedded sessions, session restoration, or passing session identity across components.
+- Use `reliability` when the central failure is an exception, crash, stuck state, hang, deadlock, wedge, race, timeout, or robustness failure.
+- If cron delivery triggers a session takeover exception while a user is actively chatting, include all three: `cron_automation`, `sessions`, and `reliability`.
+- `EmbeddedAttemptSessionTakeoverError` is both a session concern and a reliability failure, not just cron automation.
+
+Gateway and session identity:
+- Use `gateway` when gateway send behavior, gateway denial, gateway routing, gateway guardrails, or gateway message delivery is central.
+- Use `sessions` when the title names outbound session identity, session identity propagation, message sending from a session, embedded session paths, or session restoration.
+- If a PR passes outbound session identity into message sending and surfaces gateway send denial, route to `gateway` and `sessions`.
+- Do not add `hooks` merely because message sending, gateway denial, or guarded sends might involve hook-like policy internally. Add `hooks` only when hooks, hook priority, hook execution order, hook policy, or hook behavior are the named user-visible subject.
+
+Local models, Ollama, embeddings, proxy, and SSRF wording:
+- Use `local_models` when the title is about local model behavior, local Ollama, local embeddings, local model execution, or local model runtime failures.
+- Use `self_hosted_inference` when the item is about Ollama, llama.cpp, vLLM, TGI, LocalAI, or OpenAI-compatible/self-hosted inference providers.
+- Local Ollama embeddings are a local model plus self-hosted inference concern.
+- Do not add `memory` just because the word `embeddings` appears. Use `memory` only when memory storage, memory retrieval, memory embeddings, or a title scope like `memory/embeddings` is central.
+- Do not add `security` merely because SSRF defenses, proxy bypass, or `NO_PROXY` appear in a title. Add `security` only when the item is centrally about a vulnerability, exploitability, security hardening, token verification security, or HIGH/CRITICAL security fixes.
+- If local Ollama embeddings fail when a proxy is enabled because SSRF defenses ignore `NO_PROXY`, route to `local_models` and `self_hosted_inference`, not `memory` or `security`.
+
+ACP, ACPX, sessions, and agent runs:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.
+- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, āACP sessionsā, or another session-shaped internal API name.
+- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.
+
+Security, hooks, and skills/plugins:
+- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.
+- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.
+- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.
+- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.
+- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.
+
+Sandboxing, subagents, and agent runtime:
+- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.
+- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.
+- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.
+- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.
+- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, exception, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, SSRF, NO_PROXY, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 10: New subsample score 3.2857142857142856 is better than old score 1.7261904761904763. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.6601587301587302 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.6601587301587302
+Iteration 10: Individual valset scores for new program: {0: 0.25, 1: 0.2, 2: 0.25, 3: 0.5, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.0, 16: 0.0, 17: 0.25, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 0.25, 27: 0.5, 28: 1.0, 29: 1.0}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.6601587301587302}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 10: Valset pareto front aggregate score: 0.9733333333333333
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8}, 7: {0, 2, 3, 4, 5, 6, 7, 8}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8}, 12: {0, 3, 6, 7, 8}, 13: {8, 2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5, 7, 8}, 20: {0, 1, 2, 4, 5, 6, 7, 8}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 7}, 23: {8, 5}, 24: {8, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 10: Best valset aggregate score so far: 0.673452380952381
+Iteration 10: Best program as per aggregate score on valset: 0
+Iteration 10: Best score on valset: 0.673452380952381
+Iteration 10: Linear pareto front program index: 0
+Iteration 10: New program candidate index: 8
+Iteration 11: Selected program 8 score: 0.6601587301587302
+Iteration 11: Proposed new text for routing_policy: Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Output JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Primary task:
+Read the GitHub item title as the main signal. Pick only topics that are central user-visible maintainer buckets. Add a second or third topic only when the title explicitly names another central concern. Ignore target URL paths, labels, incidental implementation details, and broad words that are not the actual subject.
+
+Important suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a symptom, environment, internal path, product flavor, example, test fixture, or label-spam keyword.
+
+Specific rule learned from feedback:
+For a title like:
+`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`
+the correct topics are:
+`codex`, `chat_integrations`
+
+Reasoning:
+- `Codex startup plugins` and `Docker Codex OAuth` are Codex-specific startup/OAuth behavior, so route to `codex`.
+- `WhatsApp history` is a chat integration concern, so route to `chat_integrations`.
+- Do not add `gateway` merely because WhatsApp, history sync, or message delivery might involve gateway internals.
+- Do not add `packaging_deployment` merely because `Docker` appears. Add `packaging_deployment` only when packaging, deployment, install/runtime setup, missing binaries, PATH resolution, launch packaging, or container/deployment mechanics are the central user-visible subject.
+- Do not add `auth_identity` merely because `OAuth` appears inside a Codex-flavored phrase unless the title is centrally about login, identity propagation, account identity, credential identity, or auth restoration as its own maintainer-owned concern.
+- Do not add `skills_plugins` merely because `plugins` appears in `Codex startup plugins` when the item is framed as Codex startup behavior rather than the general skills/plugin system.
+
+Topic guidance:
+- Use `codex` for Codex runtime, startup, setup, OAuth behavior in Codex contexts, Docker Codex behavior, or Codex-specific product/path behavior.
+- Use `chat_integrations` for WhatsApp, Slack, Discord, Telegram, chat history, chat bridge behavior, and other external chat app integrations.
+- Use `gateway` only when gateway routing, gateway send behavior, gateway denial, gateway guardrails, or gateway delivery is explicitly central.
+- Use `packaging_deployment` only when packaging/deployment/install/runtime availability is explicitly central.
+- Use `skills_plugins` only when the general skill/plugin system, plugin loading/scanning, skill scanner behavior, or skill/plugin security is explicitly central.
+- Use `auth_identity` only when auth, OAuth, login, account identity, credential identity, token identity, or identity propagation is explicitly central outside a product-flavored incidental phrase.
+
+Final check:
+Before returning, remove any topic added only because of words like Docker, OAuth, plugin, gateway, send, history, Codex, WhatsApp, deployment, packaging, or integration. Keep the topic only if it is the actual maintainer bucket needed to route the item.
+Iteration 11: New subsample score 2.2857142857142856 is not better than old score 3.1538461538461537, skipping
+Iteration 12: Selected program 8 score: 0.6601587301587302
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the correct maintainer bucket. Prefer the title over all other signals. Use the body only if the title is ambiguous. Do not infer topics from labels, file paths, examples, tests, or incidental implementation details.
+
+Core rules:
+- Pick one primary topic.
+- Add secondary topics only when they are explicitly central user-visible concerns.
+- Use 3 topics only when the title clearly names three central facets.
+- Use 0 topics when no allowed topic is central.
+- Never invent topic ids.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not an implementation detail, label-spam keyword, symptom, internal API name, or broad ownership area.
+
+Important calibrations from prior mistakes:
+
+1. Gateway is often label spam.
+- Do not add `gateway` merely because a bug involves sending, embedded paths, guarded delivery, or background mechanics.
+- Use `gateway` only when gateway send behavior, gateway routing, gateway denial, gateway guardrails, or gateway message delivery is the central user-visible subject.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`; it should not include `gateway`.
+
+2. Embedded OAuth paths are auth plus sessions.
+- Use `auth_identity` for OAuth, login/auth restoration, credential identity, account identity, or token identity.
+- If OAuth/auth behavior is restored or broken on an embedded path, session path, session identity, or session restoration flow, also include `sessions`.
+- Do not replace `auth_identity` with `codex` just because the title says `openai-codex` or `Codex OAuth`.
+- Add `codex` only when Codex-specific runtime, startup, setup, or behavior is itself a central maintainer-owned subject.
+
+3. Lane wedges are reliability, not queueing by default.
+- Use `reliability` for wedges, stuck states, hangs, crashes, deadlocks, races, exceptions, timeouts, or self-healing stuck work.
+- Do not add `queueing` merely because the title mentions lanes, workers, or internal scheduling terms unless queue lifecycle or scheduling behavior is central.
+
+4. ACPX Codex plugin inheritance is not sandboxing just because āsandboxā appears.
+- Use `acpx` when ACPX is explicitly named.
+- Use `codex` when Codex is explicitly a central environment/runtime/product concern.
+- Use `skills_plugins` when plugins, skills, plugin loading, or skill scanner behavior is central.
+- Do not add `sandboxing` merely because the title contains āsandboxā as an environment name. Add `sandboxing` only when sandbox policy, sandbox restrictions, sandbox mode, or sandbox configuration are themselves the subject.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` should be `acpx`, `codex`, `skills_plugins`.
+
+5. ACP binding bugs can be sessions without ACPX.
+- Use `acp` when ACP is named centrally.
+- Use `sessions` when the title is about session keys, session ownership, session restoration, session identity, active chat/session routing, or persistent process/session sharing.
+- Do not automatically add `acpx` for every ACP configured binding. Add `acpx` only when ACPX is explicitly named or the title is centrally about ACPX-specific binding semantics, ACPX backend behavior, or ACPX agent-run visibility/state contracts.
+- Example: `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`, not `acpx`.
+
+Topic reminders:
+- `docs`: documentation-only PRs, tutorials, README changes, guides, examples.
+- `cron_automation`: cron jobs, scheduled work, cron delivery, scheduled automated messages.
+- `sessions`: active chat sessions, session ownership, session takeover, embedded sessions, session keys, session restoration, session identity propagation.
+- `reliability`: crash, exception, stuck state, hang, deadlock, race, timeout, wedge, robustness failure.
+- `local_models`: local model behavior, local embeddings, local Ollama, local runtime failures.
+- `self_hosted_inference`: Ollama, llama.cpp, vLLM, TGI, LocalAI, OpenAI-compatible self-hosted inference providers.
+- `memory`: memory storage/retrieval/embeddings or title scope like `memory/embeddings`.
+- `security`: vulnerabilities, exploitability, SSRF as a vulnerability, security hardening, HIGH/CRITICAL fixes.
+- `hooks`: hook priority, hook execution order, hook policy, hook behavior.
+- `mcp_tooling`: MCP protocol, MCP conformance, MCP integrations, MCP tool behavior.
+- `tool_calling`: tool-call execution, schemas, parameter coercion, runtime behavior.
+- `browser_automation`: browser automation, Chrome/profile/session reuse, browser diagnostics/runtime.
+- `packaging_deployment`: packaging, install/runtime setup, bundled executable availability, PATH/binary resolution, `spawn npx ENOENT`.
+- `open_weight_models`: open-weight model metadata, catalogs, context windows, compatibility.
+- `local_model_providers`: provider-specific local/open-weight catalogs or known provider model metadata.
+- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, streaming lifecycle, request/response compatibility.
+- `coding_agents`: subagents, spawned coding agents, delegation, multi-agent coding workflows.
+- `agent_runtime`: agent initialization, lifecycle, backend startup, process/runtime behavior.
+- `sandboxing`: sandbox mode/policy/restrictions/configuration only when central.
+
+Final check:
+Before outputting, remove any topic that was added only because of a keyword such as gateway, sandbox, session, binding, lane, wedge, Codex, OAuth, plugin, hook, model, provider, SSRF, cron, tool, or security unless that concept is clearly a central maintainer-owned subject in the title.
+Iteration 12: New subsample score 3.5 is better than old score 1.75. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.5403174603174603 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.5403174603174603
+Iteration 12: Individual valset scores for new program: {0: 0.3333333333333333, 1: 0.2, 2: 0.25, 3: 0.25, 4: 0.25, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.2, 18: 1.0, 19: 0.5, 20: 0.25, 21: 0.5, 22: 1.0, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.5403174603174603}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 12: Valset pareto front aggregate score: 0.975
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9}, 12: {0, 3, 6, 7, 8, 9}, 13: {8, 2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5, 9}, 19: {1, 4, 5, 7, 8}, 20: {9}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3, 9}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 12: Best valset aggregate score so far: 0.673452380952381
+Iteration 12: Best program as per aggregate score on valset: 0
+Iteration 12: Best score on valset: 0.673452380952381
+Iteration 12: Linear pareto front program index: 0
+Iteration 12: New program candidate index: 9
+Iteration 13: Selected program 0 score: 0.673452380952381
+Iteration 13: All subsample scores perfect. Skipping.
+Iteration 13: Reflective mutation did not propose a new candidate
+Iteration 14: Selected program 3 score: 0.6679761904761905
+Iteration 14: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore target URL path, examples, tests, files changed, labels, and incidental implementation details unless they are the user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Important calibration:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when session freshness, session identity, session lifecycle, session routing, or session state is central.
+- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` must be classified as:
+ {"topics_of_interest":["sessions","telemetry_usage"]}
+- In that title, `totalTokensFresh` is a session freshness/state concern, and stale token counts are a telemetry/usage concern.
+- Do not classify that title as `ui_tui`. Words like āshowingā, ādisplayingā, or āsurfacingā are label spam unless the UI, TUI, webchat, terminal interface, or reader view is itself the central subject.
+
+Suppression rules:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a symptom, path, implementation detail, broad ownership area, or incidental UI phrasing.
+- Before outputting, remove any topic added only because of words like showing, display, usage, token, session, status, UI, model, tool, gateway, security, hook, cron, or plugin unless that topic is actually a central maintainer-owned subject.
+Iteration 14: New subsample score 3.25 is better than old score 3.2. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.4648107448107448 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.4648107448107448
+Iteration 14: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 0.5, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.5, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.2, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.25}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.4648107448107447}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 14: Valset pareto front aggregate score: 1.0
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8, 10}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5, 9}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3, 9, 10}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 14: Best valset aggregate score so far: 0.673452380952381
+Iteration 14: Best program as per aggregate score on valset: 0
+Iteration 14: Best score on valset: 0.673452380952381
+Iteration 14: Linear pareto front program index: 0
+Iteration 14: New program candidate index: 10
+Iteration 15: Selected program 3 score: 0.6679761904761905
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output rule:
+- If a structured-output tool named `final_json` is available, you MUST call it exactly once.
+- Call it with exactly:
+ {"topics_of_interest":["topic_id"]}
+- If no such tool is available, return only this JSON object and no prose:
+ {"topics_of_interest":["topic_id"]}
+- Never omit the final output. Never explain your reasoning. Never output markdown.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Before final output, remove any topic added only from keyword matching.
+
+Important calibration:
+
+Model serving, local models, and local providers:
+- Use `model_serving` for serving endpoints, OpenAI-compatible APIs, Responses API behavior, request/response protocol behavior, streaming lifecycle, final usage chunks, endpoint compatibility, base URL behavior, request routing, model-server compatibility, or automatic routing of model requests.
+- Use `local_models` when a local model app/runtime/provider is itself a user-visible subject, especially compatibility or behavior with named runtimes like LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar.
+- Do not add `local_models` merely because a local runtime is incidentally mentioned, but do add it when the title is about that runtimeās visible behavior or compatibility.
+- `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+- Use `local_model_providers` for provider catalogs, model catalogs, provider-specific model availability, provider metadata, provider setup, or provider catalog entries.
+- `Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` => `local_model_providers`, `model_serving`.
+
+Policy, config, MCP, and security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, configuration-governed enforcement, or repo/project policy.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+- Do not map āmodel policyā or āmodel conformanceā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Security composites:
+- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.
+- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.
+
+Token counts and usage freshness:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when freshness, identity, lifecycle, routing, or state of a session is central.
+- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage chunks; streaming protocol lifecycle remains `model_serving`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually include only `docs`.
+- Add the documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.
+
+MCP and tool calling:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or clearly ACPX-specific.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Bug]: ACP configured binding uses parent channel ID for session key` => `acp`, `sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending or delivery plumbing.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions lane, main lane, worker, subagent, or internal execution paths.
+- Active-memory recall deadlocks are usually `memory`, `reliability`.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+
+Final suppression check:
+Remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, LM Studio, catalog, or hook. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6468512627723154 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6468512627723154
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 0.2857142857142857, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.25, 21: 0.25, 22: 1.0, 23: 0.10526315789473684, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.15384615384615385}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6468512627723154}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 15: Valset pareto front aggregate score: 1.0
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 3, 11}, 1: {11, 2, 3}, 2: {0, 11, 3}, 3: {0, 11, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10, 11}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9, 11}, 8: {0, 1, 3, 11}, 9: {0, 1, 2, 3, 6, 8, 10, 11}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9, 11}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5, 11}, 18: {0, 1, 2, 3, 4, 5, 9, 11}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 11, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 2, 7, 8, 11}, 26: {0, 1, 3, 9, 10, 11}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 15: Best valset aggregate score so far: 0.673452380952381
+Iteration 15: Best program as per aggregate score on valset: 0
+Iteration 15: Best score on valset: 0.673452380952381
+Iteration 15: Linear pareto front program index: 0
+Iteration 15: New program candidate index: 11
+Iteration 16: Selected program 0 score: 0.673452380952381
+Iteration 16: All subsample scores perfect. Skipping.
+Iteration 16: Reflective mutation did not propose a new candidate
+Iteration 17: Selected program 3 score: 0.6679761904761905
+Iteration 17: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+You MUST produce structured output. If a `final_json` tool is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, auth_identity,
+packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Classification process:
+1. Read the title first. The title is the primary signal.
+2. Use the first clear body summary only when the title is ambiguous.
+3. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+4. Pick one primary topic.
+5. Add secondary topics only when they are explicit central maintainer-owned subjects.
+6. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+7. Use 0 topics when no allowed topic is central.
+8. Never invent topic ids. Never output labels outside the allowed list.
+9. Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, broad impact, and label-spam keywords.
+
+Hard output rule:
+Calling `final_json` is part of the answer when the tool exists. Do not finish without calling it. Do not emit prose, markdown, explanations, or an empty unstructured response.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the user-visible subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Provider catalogs:
+- Use `local_model_providers` for provider catalogs, model-provider integrations, missing provider models, provider routing, provider-specific model availability, and provider catalog errors.
+- A missing Google Vertex/Gemini provider catalog entry is `local_model_providers`.
+- Add `reliability` when the user-visible failure mode is silent failure, hang, crash, wedge, missing error, or other robustness failure.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Local models and self-hosted inference:
+- Use `local_models` when the local model app/runtime/provider is the user-visible subject, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, local embeddings, or local runtime behavior.
+- Use `self_hosted_inference` when the issue concerns self-hosted inference endpoints, local inference services, local embeddings endpoints, proxying to local inference, or self-hosted model connectivity.
+- Ollama embeddings failures are usually both `local_models` and `self_hosted_inference`.
+- Do not add `security` merely because the title mentions SSRF, SSRF defenses, proxy bypasses, or NO_PROXY when the actual user-visible bug is that a local/self-hosted model service fails.
+- Do not add `config` merely because proxy environment variables or NO_PROXY are mentioned, unless configuration policy/enforcement is the central subject.
+- Example: `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`.
+
+Reliability, sessions, and auth identity:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedge, timeout, self-healing of wedged execution, silent failure, or robustness failure.
+- Use `sessions` when session identity, session lifecycle, session routing, embedded session paths, persistent process identity, session state, or session-specific behavior is central.
+- Use `auth_identity` when OAuth, sign-in, authentication identity, credential identity, token identity, account identity, or restored login/auth flow is central.
+- Do not add `queueing` merely because the title says lane, main lane, worker, subagent, wedge, or internal execution path. Queueing requires queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior to be the user-visible subject.
+- Do not add `codex` merely because the title says openai-codex or contains Codex-branded OAuth. If the central subject is OAuth/auth restoration, use `auth_identity`; if it happens on an embedded/session path, add `sessions`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+
+Browser automation and packaging/deployment:
+- Use `browser_automation` for Chrome/Chromium sessions, browser automation, browser diagnostics, browser runtime behavior, browser profile attach, browser tooling, and browser sign-in flows.
+- Use `packaging_deployment` for desktop app packaging, bundled runtime issues, missing executable/path problems, spawn errors caused by missing installed tools, `npx`/Node availability in packaged apps, installer/distribution/runtime packaging, or deployment environment failures.
+- If a desktop browser automation issue also fails because a packaged or deployed binary cannot spawn `npx` or another required executable, include `packaging_deployment`.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Security:
+- Use `security` for actual vulnerabilities, SSRF as the central vulnerability, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vulnerability fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- Do not use `security` when security words describe an implementation mechanism but the user-visible subject is local model connectivity, embeddings, packaging, sessions, or provider behavior.
+
+Config:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Do not use `config` merely for proxy variables, env vars, provider settings, or setup details unless the title is about configuration behavior itself.
+
+Codex:
+- Use `codex` when Codex itself is the central user-visible product/runtime/setup subject, including Codex startup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because `openai-codex` appears as an OAuth provider name or bracketed metadata.
+
+Queueing:
+- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, before_prompt_build, internal path names, or wedge recovery.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`.
+- Add the central documented area only when that area is explicitly central in the title.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- Example: `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, Ollama, OAuth, proxy, NO_PROXY, npx, Chrome, or hook. Keep the topic only if that concept is actually a central maintainer-owned subject.
+Iteration 17: New subsample score 3.5 is better than old score 0.7678571428571428. Continue to full eval and add to candidate pool.
+Iteration 17: Valset score for new program: 0.5279059829059829 (coverage 30 / 30)
+Iteration 17: Val aggregate for new program: 0.5279059829059829
+Iteration 17: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.0, 3: 0.5, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.25, 18: 0.25, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.25, 23: 1.0, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 17: Objective aggregate scores for new program: {'weighted_score': 0.5279059829059828}
+Iteration 17: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 17: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 17: Valset pareto front aggregate score: 1.0
+Iteration 17: Updated valset pareto front programs: {0: {0, 1, 2, 3, 11}, 1: {11, 2, 3}, 2: {0, 11, 3}, 3: {0, 11, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 12}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12}, 8: {0, 1, 3, 11}, 9: {0, 1, 2, 3, 6, 8, 10, 11}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9, 11, 12}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10, 12}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5, 11}, 18: {0, 1, 2, 3, 4, 5, 9, 11}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8, 12}, 22: {8, 9, 11, 7}, 23: {8, 12, 5}, 24: {6, 7, 8, 9, 12}, 25: {0, 2, 7, 8, 11}, 26: {0, 1, 3, 9, 10, 11, 12}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 17: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 17: Best valset aggregate score so far: 0.673452380952381
+Iteration 17: Best program as per aggregate score on valset: 0
+Iteration 17: Best score on valset: 0.673452380952381
+Iteration 17: Linear pareto front program index: 0
+Iteration 17: New program candidate index: 12
+{
+ "best_idx": 0,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md",
+ "best_score": 0.673452380952381,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 17,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 12379,
+ "seed_routing_policy_sha256": "6389ec2152a1c1c9c8e7fd5c7a3b522f95eba44351fc274647d3fdbca5490b14"
+ },
+ "created_at": "2026-06-14T02:12:36.124710+00:00",
+ "num_candidates": 13,
+ "num_full_val_evals": 13,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/gepa-result.json",
+ "total_metric_calls": 518
+}
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/post_run_queue.status b/gepa-12b-row30-prop16-continuation-20260613T214937Z/post_run_queue.status
new file mode 100644
index 0000000000000000000000000000000000000000..4b8b7cbbf892399771bd09f6b50333da003cfa66
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/post_run_queue.status
@@ -0,0 +1,7 @@
+current_run_dir=/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z
+stopped_without_validation=1
+reason=launched_continuation
+proposal_attempts=15
+accepted_full_eval_candidates=12
+num_candidates=13
+continuation_number=1
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.json b/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..d2677297a165472c200b895ea3d4f2085b87e59f
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.json
@@ -0,0 +1,760 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 3,
+ 10,
+ 19,
+ 17
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.3333333333333333,
+ 0.2,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 0.5,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 1,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 5,
+ 0,
+ 7,
+ 27
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 1.0,
+ 0.125
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 2,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 2,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 25,
+ 14,
+ 21,
+ 26
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ]
+ },
+ {
+ "i": 3,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 22,
+ 2,
+ 20,
+ 4
+ ],
+ "subsample_scores": [
+ 0.3333333333333333,
+ 1.0,
+ 0.2,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 3,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 4,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 28,
+ 6,
+ 18,
+ 11
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 0.2857142857142857,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 4,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 5,
+ "selected_program_candidate": 4,
+ "subsample_ids": [
+ 23,
+ 9,
+ 29,
+ 15
+ ],
+ "subsample_scores": [
+ 0.15384615384615385,
+ 1.0,
+ 0.16666666666666666,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.5,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 5,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 6,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 16,
+ 8,
+ 1,
+ 13
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.25,
+ 1.0
+ ]
+ },
+ {
+ "i": 7,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 24,
+ 12,
+ 12,
+ 24
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 0.5,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 6,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 8,
+ "selected_program_candidate": 6,
+ "subsample_ids": [
+ 29,
+ 22,
+ 5,
+ 19
+ ],
+ "subsample_scores": [
+ 0.2857142857142857,
+ 0.5,
+ 1.0,
+ 0.3333333333333333
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 7,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 9,
+ "selected_program_candidate": 7,
+ "subsample_ids": [
+ 28,
+ 4,
+ 6,
+ 23
+ ],
+ "subsample_scores": [
+ 0.3333333333333333,
+ 1.0,
+ 0.25,
+ 0.14285714285714285
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.2857142857142857,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 8,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 10,
+ "selected_program_candidate": 8,
+ "subsample_ids": [
+ 18,
+ 14,
+ 9,
+ 3
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.15384615384615385
+ ],
+ "new_subsample_scores": [
+ 0.2857142857142857,
+ 1.0,
+ 0.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 11,
+ "selected_program_candidate": 8,
+ "subsample_ids": [
+ 27,
+ 13,
+ 11,
+ 26
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 0.25,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.5,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 9,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 12,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 7,
+ 10,
+ 15,
+ 24
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 13,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 20,
+ 21,
+ 12,
+ 2
+ ],
+ "subsample_scores": [
+ 0.2,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.25
+ ],
+ "new_program_idx": 10,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 14,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 0,
+ 17,
+ 25,
+ 1
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.5,
+ 0.5,
+ 0.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 11,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 15,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 8,
+ 16,
+ 16,
+ 8
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 16,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 13,
+ 23,
+ 27,
+ 24
+ ],
+ "subsample_scores": [
+ 0.0,
+ 0.14285714285714285,
+ 0.125,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 12,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.txt b/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b0667c4578c94847dc04137a0e28c94618e7fa3d
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log.txt
@@ -0,0 +1,1556 @@
+Iteration 0: Base program full valset score: 0.673452380952381 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.673452380952381
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+Important domain rules:
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing, subagents, and runtimes:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+
+Local models and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 1: New subsample score 3.5 is better than old score 2.033333333333333. Continue to full eval and add to candidate pool.
+Iteration 1: Valset score for new program: 0.5525091575091575 (coverage 30 / 30)
+Iteration 1: Val aggregate for new program: 0.5525091575091575
+Iteration 1: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.0, 3: 0.25, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.25, 12: 0.14285714285714285, 13: 0.2, 14: 1.0, 15: 0.5, 16: 0.2, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.0}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.5525091575091575}
+Iteration 1: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.25}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 1: Valset pareto front aggregate score: 0.7510805860805861
+Iteration 1: Updated valset pareto front programs: {0: {0, 1}, 1: {1}, 2: {0}, 3: {0}, 4: {0, 1}, 5: {1}, 6: {0, 1}, 7: {0}, 8: {0, 1}, 9: {0, 1}, 10: {0, 1}, 11: {0}, 12: {0}, 13: {0}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {0, 1}, 18: {0, 1}, 19: {1}, 20: {0, 1}, 21: {0}, 22: {1}, 23: {1}, 24: {0, 1}, 25: {0}, 26: {0, 1}, 27: {1}, 28: {1}, 29: {0}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 1: Best valset aggregate score so far: 0.673452380952381
+Iteration 1: Best program as per aggregate score on valset: 0
+Iteration 1: Best score on valset: 0.673452380952381
+Iteration 1: Linear pareto front program index: 0
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 0 score: 0.673452380952381
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like āper-bindingā, āconfigured bindingā, ābinding IDā, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` just because a title says āACP sessionsā if the actual change is permission policy, binding policy, or per-agent configuration. Treat āsessionsā as label spam unless session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is the user-visible subject.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name. Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+- `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`, not `queueing` or `codex`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+- Do not add broad demo/showcase labels or `tool_calling` just because docs mention tool boundaries.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Gateway and runtime:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Do not substitute broad infrastructure topics for the listed user-visible subjects.
+- In composite titles, each clause may contribute a different topic. For example, one clause can be `reliability` while another contributes `auth_identity` and `sessions`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 2: New subsample score 4.0 is better than old score 2.375. Continue to full eval and add to candidate pool.
+Iteration 2: Valset score for new program: 0.6586202686202686 (coverage 30 / 30)
+Iteration 2: Val aggregate for new program: 0.6586202686202686
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.2857142857142857, 3: 0.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.5, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.15384615384615385, 23: 0.25, 24: 0.5, 25: 1.0, 26: 0.25, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.6586202686202686}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 2: Valset pareto front aggregate score: 0.8495238095238095
+Iteration 2: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {2}, 2: {0}, 3: {0}, 4: {0, 1, 2}, 5: {1, 2}, 6: {0, 1, 2}, 7: {0, 2}, 8: {0, 1}, 9: {0, 1, 2}, 10: {0, 1, 2}, 11: {0}, 12: {0}, 13: {2}, 14: {0, 1, 2}, 15: {0}, 16: {2}, 17: {0, 1, 2}, 18: {0, 1, 2}, 19: {1}, 20: {0, 1, 2}, 21: {0, 2}, 22: {1}, 23: {2}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1}, 27: {2}, 28: {1, 2}, 29: {0, 2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 2: Best valset aggregate score so far: 0.673452380952381
+Iteration 2: Best program as per aggregate score on valset: 0
+Iteration 2: Best score on valset: 0.673452380952381
+Iteration 2: Linear pareto front program index: 0
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 1 score: 0.5525091575091575
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+Important domain rules:
+
+Provider catalogs, local providers, and routing:
+- Use `local_model_providers` when the central subject is adding, updating, cataloging, configuring, or routing through a named model provider or provider catalog.
+- Provider catalog names such as Xiaomi, DeepSeek, LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar provider/runtime catalogs are strong signals for `local_model_providers` when the title says a model is being added to that catalog.
+- Use `model_serving` when the central subject is serving behavior, endpoint compatibility, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, request routing, automatic routing, multimodal routing, or model-server compatibility.
+- If a title combines a provider catalog/model-provider change with automatic routing, request routing, multimodal routing, Responses API behavior, streaming, or OpenAI-compatible serving behavior, include both `local_model_providers` and `model_serving`.
+- Example: `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` must be classified as `local_model_providers` and `model_serving`.
+- Do not collapse provider catalog work to only `model_serving` when the catalog/provider addition is explicitly central.
+- Do not use `local_models` merely because a model name appears. Use `local_models` only when a local model app/runtime itself is central.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing, subagents, and runtimes:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, or `sandbox.mode` is central.
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 3: New subsample score 3.5 is not better than old score 3.5, skipping
+Iteration 4: Selected program 0 score: 0.673452380952381
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important calibration rules:
+
+Security composites:
+- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.
+- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.
+- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.
+- Do not collapse a composite security title to only `security` when the title explicitly names maintainer-owned subsystems like hooks or skill scanners.
+
+Token counts and usage freshness:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when the freshness, identity, lifecycle, routing, or state of a session is central.
+- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` is `sessions` and `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā token counts. UI is label spam unless the user-facing interface itself is central.
+
+Model serving, local models, and usage chunks:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` is only `model_serving`.
+- Do not add `local_models` merely because a local runtime such as llama.cpp, Ollama, LM Studio, vLLM, TGI, or LocalAI is named. Add `local_models` only when the local model app/provider/runtime itself is the user-visible subject.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, or final usage chunks. If the issue is that the streaming protocol closes before the final usage chunk, classify it as `model_serving`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`.
+- Add the central documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention tool boundaries unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, and only `acpx` if ACPX is explicitly named.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+- Do not add `ui_tui` merely because something is shown, displayed, or surfaced unless the UI itself is central.
+
+Runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, or hook. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 3.5 is better than old score 1.819047619047619. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.6679761904761905 (coverage 30 / 30)
+Iteration 4: Val aggregate for new program: 0.6679761904761905
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.2857142857142857, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.0, 21: 1.0, 22: 0.2857142857142857, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.25}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.6679761904761905}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 4: Valset pareto front aggregate score: 0.8733333333333333
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3}, 4: {3}, 5: {1, 2}, 6: {0, 1, 2, 3}, 7: {0, 2, 3}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3}, 11: {0}, 12: {0, 3}, 13: {2}, 14: {0, 1, 2, 3}, 15: {0}, 16: {2, 3}, 17: {0, 1, 2}, 18: {0, 1, 2, 3}, 19: {1}, 20: {0, 1, 2}, 21: {0, 2, 3}, 22: {1}, 23: {2}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {1, 2, 3}, 29: {0, 2, 3}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 4: Best valset aggregate score so far: 0.673452380952381
+Iteration 4: Best program as per aggregate score on valset: 0
+Iteration 4: Best score on valset: 0.673452380952381
+Iteration 4: Linear pareto front program index: 0
+Iteration 4: New program candidate index: 3
+Iteration 5: Selected program 1 score: 0.5525091575091575
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+General suppression:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+- Never output disallowed ids such as `packaging_deployment`.
+
+Reliability:
+- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.
+- For cron/chat/session collision bugs such as `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`, include:
+ `cron_automation`, `sessions`, `reliability`.
+- Do not add `reliability` for routine implementation cleanup unless the title is about making behavior more robust or fixing a failure.
+
+Security and CI:
+- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.
+- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:
+ `security`.
+- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is not allowed.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.
+- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.
+- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:
+ `acpx`, `codex`, `skills_plugins`.
+
+Skills and plugins:
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.
+- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.
+- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.
+
+ACPX, ACP, and Codex:
+- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.
+- Use `acp` for ACP-specific protocol/workflow behavior.
+- Use `codex` when Codex is explicitly a central product/runtime/integration subject.
+- For an ACPX Codex feature involving user-installed plugins, include all central facets:
+ `acpx`, `codex`, `skills_plugins`.
+
+Sandboxing, subagents, and runtimes:
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`. Treat `sessions_spawn` as an implementation name unless session identity, lifecycle, routing, persistence, or state is the user-visible subject.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` when the actual problem is subagent/runtime initialization.
+- Include `sessions` when the title describes an active chat/session takeover conflict.
+
+Cron automation:
+- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.
+- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.
+
+Local models and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 5: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.4812393162393162 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.4812393162393162
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.0, 10: 1.0, 11: 1.0, 12: 0.14285714285714285, 13: 0.25, 14: 1.0, 15: 0.25, 16: 0.14285714285714285, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.25, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.48123931623931626}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 0.25}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 5: Valset pareto front aggregate score: 0.89
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3}, 4: {3}, 5: {1, 2}, 6: {0, 1, 2, 3, 4}, 7: {0, 2, 3, 4}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3, 4}, 11: {0, 4}, 12: {0, 3}, 13: {2}, 14: {0, 1, 2, 3, 4}, 15: {0}, 16: {2, 3}, 17: {0, 1, 2, 4}, 18: {0, 1, 2, 3, 4}, 19: {1, 4}, 20: {0, 1, 2, 4}, 21: {0, 2, 3}, 22: {1}, 23: {2, 4}, 24: {0, 1, 2}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4}, 29: {0, 2, 3}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 5: Best valset aggregate score so far: 0.673452380952381
+Iteration 5: Best program as per aggregate score on valset: 0
+Iteration 5: Best score on valset: 0.673452380952381
+Iteration 5: Linear pareto front program index: 0
+Iteration 5: New program candidate index: 4
+Iteration 6: Selected program 4 score: 0.4812393162393162
+Iteration 6: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents,
+api_surface, ui_tui, chat_integrations, skills_plugins, acp, acpx,
+approvals, agent_runtime, model_serving, local_models,
+self_hosted_inference, telemetry_usage, exec_tools, sandboxing,
+browser_automation, cron_automation, config, security, mcp_tooling,
+tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping explicitly central second or third concerns.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, broad impact, and incidental implementation details unless they are the actual user-visible subject.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+
+General suppression:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+- Ignore words that appear only in internal function or event names, such as `sessions_yield` or `sessions_spawn`, unless the actual user-visible subject is session identity, lifecycle, routing, ownership, persistence, or active-session conflict.
+- Never output disallowed ids such as `packaging_deployment`.
+
+Reliability:
+- Use `reliability` when the title describes a failure mode, hardening, deadlock, crash, race, takeover error, dropped work, broken delivery, or correctness issue that is central to the user-visible problem.
+- Do not add `reliability` merely because the title contains words like fail, failure, stale, broken, or error if the maintainer-owned subject is a protocol/workflow contract such as ACP/ACPX state visibility.
+- For `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`, classify only:
+ `acpx`, `acp`.
+ Do not add `sessions` for `sessions_yield`, and do not add `reliability` for the failure/stale wording.
+
+Security:
+- Use `security` for supply-chain boundaries, dependency trust, GitHub Actions hardening, secret exposure, permissions, auth, or attack-surface reduction.
+- Do not add `security` merely because the title mentions SSRF defenses, proxy bypass, NO_PROXY, network policy, or a security-sounding mechanism when the user-visible problem is compatibility with local/self-hosted inference.
+- A title like `ci: harden GitHub Actions supply-chain boundaries` is only:
+ `security`.
+
+Exec, sandboxing, and approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, exec v2 behavior, and exec contract behavior.
+- If the title says `exec v2 contract`, `exec contract`, or `exec v2 contract follow-through`, classify all contract areas that are part of that surface:
+ `exec_tools`, `sandboxing`, `approvals`.
+- Do this even if the PR title starts with `test(...)`; tests can still be the vehicle for landing a user-visible contract.
+- Do not collapse exec contract work to only `exec_tools`.
+
+Sandboxing:
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox behavior, filesystem/network restrictions, approval boundaries, or `sandbox.mode` are central.
+- Do not use `sandboxing` merely because the title says `sandbox` as an environment name, container, ACPX sandbox, Codex sandbox, test sandbox, or runtime workspace unless the subject is the sandbox policy/permission behavior itself.
+- A title like `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` is not about sandbox policy. It is:
+ `acpx`, `codex`, `skills_plugins`.
+
+Skills and plugins:
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, skills, marketplace plugins, plugin loading, plugin availability, or plugin installation behavior is central.
+- If a title says a Codex or ACPX environment should inherit user-installed plugins, include `skills_plugins`.
+- Do not replace `skills_plugins` with `sandboxing` just because the plugins are being inherited into a sandboxed environment.
+
+ACPX, ACP, and Codex:
+- Use `acpx` when ACPX itself, ACPX environments, ACPX protocol/workflow, or ACPX-specific behavior is central.
+- Use `acp` for ACP-specific protocol/workflow behavior.
+- Use `codex` when Codex is explicitly a central product/runtime/integration subject.
+- For explicit ACP agent runs, embedded backend visibility, final JSON state, or ACP/ACPX agent-run contracts, prefer `acp` and/or `acpx` only unless another central subject is explicitly named.
+- For an ACPX Codex feature involving user-installed plugins, include all central facets:
+ `acpx`, `codex`, `skills_plugins`.
+
+Sandboxing, subagents, and runtimes:
+- Use `agent_runtime` when the issue is about agent execution runtimes, subagent initialization, runtime startup, runtime ownership, or process/runtime lifecycle.
+- Use `coding_agents` when coding-agent behavior is central, including subagents, sessions_spawn subagent initialization, Codex/Claude-style coding-agent orchestration, or agent worker startup.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` is:
+ `coding_agents`, `sandboxing`, `agent_runtime`.
+- Do not add `sessions` just because a path or internal component contains `sessions_spawn`.
+
+Sessions:
+- Use `sessions` only when session identity, session lifecycle, session routing, session state, persistent process identity, takeover/ownership, active chat session conflicts, or session-specific behavior is central.
+- Suppress `sessions` for internal names such as `sessions_spawn` and `sessions_yield` when the actual problem is subagent/runtime initialization or ACP/ACPX state handling.
+- Include `sessions` when the title describes an active chat/session takeover conflict.
+
+Cron automation:
+- Use `cron_automation` when scheduled jobs, cron announcements, recurring automation, scheduled delivery, or cron-triggered behavior is central.
+- If cron delivery conflicts with active user chat/session ownership, include both `cron_automation` and `sessions`; add `reliability` when the title describes an error or broken behavior.
+
+Local models, self-hosted inference, and model serving:
+- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted providers.
+- Ollama and LM Studio are strong signals for `local_models`.
+- Use `self_hosted_inference` when the central subject is running inference or embeddings through a local/self-hosted backend, especially with Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, localhost/LAN providers, proxy bypass, NO_PROXY, or self-hosted endpoint reachability.
+- For `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify:
+ `local_models`, `self_hosted_inference`.
+ Do not add `model_serving` unless the title is about serving protocol behavior, endpoint compatibility, streaming lifecycle, response chunks, or request routing. Do not add `security` unless security hardening itself is the central user-visible change.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible protocol behavior, Responses API behavior, streaming lifecycle, response chunks, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local model provider with the Responses API, OpenAI-compatible API behavior, streaming, or request/response protocol behavior, include both `local_models` and `model_serving`.
+- A title like `fix: LM Studio thinking blocks invisible with Responses API` is:
+ `model_serving`, `local_models`.
+
+Open-weight models and local model providers:
+- Use `open_weight_models` when the central subject is open-weight model families, model metadata, context windows, capabilities, catalogs, or known properties for open-weight models.
+- Use `local_model_providers` when the central subject is provider integration or provider catalog/metadata for open-weight/local-model providers.
+- In this taxonomy, Amazon Bedrock Mantle open-weight model metadata, including known context windows, routes to `local_model_providers` as well as `open_weight_models`.
+- For `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`, classify:
+ `open_weight_models`, `local_model_providers`.
+ Do not add `model_serving`; context-window metadata for model catalog/provider support is not serving protocol behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, proxy, NO_PROXY, SSRF, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, session, sandbox, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 6: New subsample score 3.5 is better than old score 1.5705128205128205. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.608452380952381 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.608452380952381
+Iteration 6: Individual valset scores for new program: {0: 0.5, 1: 0.14285714285714285, 2: 0.25, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 0.5, 7: 1.0, 8: 0.25, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.0, 23: 1.0, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 1.0, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.6084523809523809}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 6: Valset pareto front aggregate score: 0.94
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3}, 5: {1, 2, 5}, 6: {0, 1, 2, 3, 4}, 7: {0, 2, 3, 4, 5}, 8: {0, 1, 3}, 9: {0, 1, 2, 3}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5}, 12: {0, 3}, 13: {2, 5}, 14: {0, 1, 2, 3, 4, 5}, 15: {0, 5}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5}, 20: {0, 1, 2, 4, 5}, 21: {0, 2, 3}, 22: {1}, 23: {5}, 24: {0, 1, 2, 5}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 6: Best valset aggregate score so far: 0.673452380952381
+Iteration 6: Best program as per aggregate score on valset: 0
+Iteration 6: Best score on valset: 0.673452380952381
+Iteration 6: Linear pareto front program index: 0
+Iteration 6: New program candidate index: 5
+Iteration 7: Selected program 2 score: 0.6586202686202686
+Iteration 7: Proposed new text for routing_policy: When classifying GitHub issue/PR titles, keep the topic set minimal but do not drop a central notification-delivery concern.
+
+For titles about Slack pushes, target-channel pushes, heartbeat pushes, announce messages, cron --announce delivery, or pushed-message identity overlays:
+- Add `notifications` when the title is about notification behavior itself, including announce-path pushes, heartbeat pushes, target-channel pushes, delivery of notification messages, or identity overlays on pushed notification messages.
+- Add `chat_integrations` when Slack or another chat app/channel is a central delivery surface.
+- Do not suppress `notifications` just because Slack is present; Slack may require `chat_integrations`, but notification push behavior remains `notifications`.
+- Do not add `cron_automation` merely because the notification path mentions `cron --announce`; use `cron_automation` only if cron scheduling, lifecycle, force-run, or cron execution is the central subject.
+- Do not add `auth_identity` merely because the title says āidentity overlayā if the issue is about the displayed/sent identity on pushed notifications rather than login, OAuth, account identity, credentials, or token restoration.
+- Ignore parenthetical notes like āreply path was fixed in #12345ā unless they describe the current central problem.
+
+Example:
+Title: `[Bug]: Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes (announce path; reply path was fixed in #38235)`
+Correct output:
+{"topics_of_interest":["notifications","chat_integrations"]}
+Iteration 7: New subsample score 3.25 is not better than old score 3.5, skipping
+Iteration 8: Selected program 2 score: 0.6586202686202686
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal. Do not ignore them when they name a central domain.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules learned from prior errors:
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` must be:
+ `browser_automation`, `packaging_deployment`.
+ The Chrome sessions, Google sign-in, and user profile attach are browser automation. The `spawn npx ENOENT` failure is a packaging/deployment/runtime environment concern and must not be dropped.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` must be:
+ `memory`, `self_hosted_inference`.
+ Do not drop `memory` just because the body of the title focuses on providers; the scope is a central maintainer-owned area.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics are central. In OpenClaw, phrases like āper-bindingā, āconfigured bindingā, ābinding IDā, or per-agent/per-binding ACP policy usually indicate the ACPX binding layer, even if the title says only ACP.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` just because a title says āACP sessionsā if the actual change is permission policy, binding policy, or per-agent configuration.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, or openai-compatible. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 8: New subsample score 4.0 is better than old score 2.5. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.5296214896214896 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.5296214896214896
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.15384615384615385, 17: 0.25, 18: 0.2857142857142857, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.14285714285714285, 24: 1.0, 25: 0.0, 26: 0.25, 27: 0.5, 28: 0.5, 29: 0.2857142857142857}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.5296214896214896}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 8: Valset pareto front aggregate score: 0.9566666666666667
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 2, 3, 4, 5, 6}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5}, 12: {0, 3, 6}, 13: {2, 5}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {0, 5, 6}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5}, 20: {0, 1, 2, 4, 5, 6}, 21: {0, 2, 3, 6}, 22: {1, 6}, 23: {5}, 24: {6}, 25: {0, 2}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 8: Best valset aggregate score so far: 0.673452380952381
+Iteration 8: Best program as per aggregate score on valset: 0
+Iteration 8: Best score on valset: 0.673452380952381
+Iteration 8: Linear pareto front program index: 0
+Iteration 8: New program candidate index: 6
+Iteration 9: Selected program 6 score: 0.5296214896214896
+Iteration 9: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules:
+
+ACP, ACPX, sessions, and agent runs:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.
+- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, āACP sessionsā, or another session-shaped internal API name.
+- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` routes to `acpx`, `acp`, not `sessions`.
+
+Security, hooks, and skills/plugins:
+- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.
+- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.
+- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.
+- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.
+- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.
+
+Sandboxing, subagents, and agent runtime:
+- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.
+- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.
+- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.
+- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.
+- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.
+- Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` routes to `coding_agents`, `sandboxing`, `agent_runtime`.
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 9: New subsample score 4.0 is better than old score 2.119047619047619. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5934920634920634 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5934920634920634
+Iteration 9: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.14285714285714285, 3: 0.25, 4: 0.2857142857142857, 5: 1.0, 6: 0.25, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.25, 18: 0.25, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.14285714285714285, 24: 1.0, 25: 1.0, 26: 0.25, 27: 0.25, 28: 0.3333333333333333, 29: 1.0}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5934920634920634}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 9: Valset pareto front aggregate score: 0.9733333333333333
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 2, 3, 4, 5, 6, 7}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7}, 12: {0, 3, 6, 7}, 13: {2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5, 7}, 20: {0, 1, 2, 4, 5, 6, 7}, 21: {0, 2, 3, 6, 7}, 22: {7}, 23: {5}, 24: {6, 7}, 25: {0, 2, 7}, 26: {0, 1, 3}, 27: {2}, 28: {4, 5}, 29: {5, 7}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 9: Best valset aggregate score so far: 0.673452380952381
+Iteration 9: Best program as per aggregate score on valset: 0
+Iteration 9: Best score on valset: 0.673452380952381
+Iteration 9: Linear pareto front program index: 0
+Iteration 9: New program candidate index: 7
+Iteration 10: Selected program 7 score: 0.5934920634920634
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Conventional commit scopes in the title, such as `feat(memory/embeddings): ...`, are part of the title signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, and incidental implementation details unless they are the actual user-visible subject.
+
+Process:
+1. Read the full title, including any conventional commit type/scope prefix.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing rules:
+
+Cron automation, sessions, and reliability:
+- Use `cron_automation` when cron jobs, scheduled work, cron announce, automated scheduled messages, or cron delivery behavior are central.
+- Use `sessions` when the issue is about active chat sessions, session ownership, session takeover, outbound session identity, embedded sessions, session restoration, or passing session identity across components.
+- Use `reliability` when the central failure is an exception, crash, stuck state, hang, deadlock, wedge, race, timeout, or robustness failure.
+- If cron delivery triggers a session takeover exception while a user is actively chatting, include all three: `cron_automation`, `sessions`, and `reliability`.
+- `EmbeddedAttemptSessionTakeoverError` is both a session concern and a reliability failure, not just cron automation.
+
+Gateway and session identity:
+- Use `gateway` when gateway send behavior, gateway denial, gateway routing, gateway guardrails, or gateway message delivery is central.
+- Use `sessions` when the title names outbound session identity, session identity propagation, message sending from a session, embedded session paths, or session restoration.
+- If a PR passes outbound session identity into message sending and surfaces gateway send denial, route to `gateway` and `sessions`.
+- Do not add `hooks` merely because message sending, gateway denial, or guarded sends might involve hook-like policy internally. Add `hooks` only when hooks, hook priority, hook execution order, hook policy, or hook behavior are the named user-visible subject.
+
+Local models, Ollama, embeddings, proxy, and SSRF wording:
+- Use `local_models` when the title is about local model behavior, local Ollama, local embeddings, local model execution, or local model runtime failures.
+- Use `self_hosted_inference` when the item is about Ollama, llama.cpp, vLLM, TGI, LocalAI, or OpenAI-compatible/self-hosted inference providers.
+- Local Ollama embeddings are a local model plus self-hosted inference concern.
+- Do not add `memory` just because the word `embeddings` appears. Use `memory` only when memory storage, memory retrieval, memory embeddings, or a title scope like `memory/embeddings` is central.
+- Do not add `security` merely because SSRF defenses, proxy bypass, or `NO_PROXY` appear in a title. Add `security` only when the item is centrally about a vulnerability, exploitability, security hardening, token verification security, or HIGH/CRITICAL security fixes.
+- If local Ollama embeddings fail when a proxy is enabled because SSRF defenses ignore `NO_PROXY`, route to `local_models` and `self_hosted_inference`, not `memory` or `security`.
+
+ACP, ACPX, sessions, and agent runs:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named or when ACP binding semantics, embedded ACP backends, explicit ACP agent runs, per-binding behavior, configured bindings, binding IDs, or ACP agent-run visibility/state contracts are central.
+- Do not add `sessions` merely because the title contains `sessions_yield`, `sessions_spawn`, āACP sessionsā, or another session-shaped internal API name.
+- If the real subject is ACP/ACPX agent-run visibility, embedded backend behavior, stale final JSON state after a yield point, or binding-layer behavior, prefer `acp` + `acpx` and suppress `sessions` unless session lifecycle/restoration itself is the user-visible subject.
+
+Security, hooks, and skills/plugins:
+- Use `security` when vulnerabilities, SSRF, token verification security, HIGH/CRITICAL vulns, exploitability, or security hardening are central.
+- Use `skills_plugins` when skills, plugins, skill scanner behavior, plugin loading, skill scanning, or skill/plugin security is a central named subject.
+- Use `hooks` when hooks, hook priority, hook execution order, hook policy, or hook behavior is a central named subject.
+- For a security PR whose title lists multiple vulnerable subsystems, include `security` plus the explicitly central subsystem topics, up to 3 total.
+- Do not add `auth_identity` just because token verification appears in a security vulnerability list, unless login, OAuth, identity propagation, account identity, or credential identity is the central user-visible subject.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` routes to `security`, `hooks`, `skills_plugins`.
+
+Sandboxing, subagents, and agent runtime:
+- Use `sandboxing` when sandbox mode, sandbox policy, sandbox restrictions, or sandbox configuration are central.
+- Use `coding_agents` when subagents, coding agents, spawned agents, agent delegation, or multi-agent coding workflows are central.
+- Use `agent_runtime` when agent initialization, runtime lifecycle, backend startup, execution runtime, or agent process behavior is central.
+- Do not add `sessions` merely because the title contains `sessions_spawn`; treat it as an internal spawn mechanism unless session lifecycle is clearly the subject.
+- If a sandbox mode breaks subagent initialization, include `sandboxing`, `coding_agents`, and `agent_runtime`.
+
+Browser automation plus packaging/deployment:
+- Use `browser_automation` when the central subject is browser automation, browser diagnostics, Chrome session reuse, browser runtime behavior, profile attach behavior, browser tooling, or desktop browser integration.
+- Use `packaging_deployment` when the title centrally names packaging, deployment, install/runtime environment setup, missing bundled executables, missing npm/node binaries, PATH/binary resolution, launch packaging, or errors such as `spawn npx ENOENT`.
+- If a desktop/browser issue combines browser session/profile behavior with a missing executable or packaging/runtime availability failure, include both topics.
+
+Memory and self-hosted inference:
+- Use `memory` when memory, embeddings under memory, memory retrieval/storage, or a conventional commit scope such as `memory/embeddings` is central.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- If the title says `memory/embeddings` and adds an OpenAI-compatible provider for self-hosted servers, include both `memory` and `self_hosted_inference`.
+
+Auth and identity:
+- Use `auth_identity` when OAuth, login/auth flows, account identity, user identity, credential identity, token identity, auth restoration, or identity propagation is central.
+- Do not replace OAuth/auth identity bugs with `codex` merely because the title contains āopenai-codexā, āCodex OAuthā, or a Codex-flavored product/path name.
+- Add `codex` only when Codex runtime, startup, setup, or Codex-specific behavior is itself the maintainer-owned subject.
+- If auth/OAuth behavior is tied to an embedded session path, session identity, or session restoration, include `sessions`.
+
+Reliability and queueing:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, wedge, stuck state, timeout, self-healing of stuck work, exception, or robustness failure.
+- ālane wedgeā, ālane wedgedā, āself-heal lane wedgesā, āworker stuckā, or similar stuck-lane wording is a reliability issue unless queue scheduling/lifecycle is explicitly the user-visible subject.
+- Do not add `queueing` merely because a reliability bug mentions a lane, main lane, worker, subagent, internal path, or execution hook.
+- Use `queueing` only when queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, āopenai-compatibleā, or names llama.cpp/Ollama/vLLM/TGI/LocalAI unless serving protocol behavior is central.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`, plus a central documented area only when that area is explicitly part of the title.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, wedge, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, Chrome, npx, embeddings, provider, openai-compatible, SSRF, NO_PROXY, sessions_yield, or sessions_spawn. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 10: New subsample score 3.2857142857142856 is better than old score 1.7261904761904763. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.6601587301587302 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.6601587301587302
+Iteration 10: Individual valset scores for new program: {0: 0.25, 1: 0.2, 2: 0.25, 3: 0.5, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.0, 16: 0.0, 17: 0.25, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 0.25, 27: 0.5, 28: 1.0, 29: 1.0}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.6601587301587302}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 10: Valset pareto front aggregate score: 0.9733333333333333
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8}, 7: {0, 2, 3, 4, 5, 6, 7, 8}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8}, 12: {0, 3, 6, 7, 8}, 13: {8, 2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5}, 19: {1, 4, 5, 7, 8}, 20: {0, 1, 2, 4, 5, 6, 7, 8}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 7}, 23: {8, 5}, 24: {8, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 10: Best valset aggregate score so far: 0.673452380952381
+Iteration 10: Best program as per aggregate score on valset: 0
+Iteration 10: Best score on valset: 0.673452380952381
+Iteration 10: Linear pareto front program index: 0
+Iteration 10: New program candidate index: 8
+Iteration 11: Selected program 8 score: 0.6601587301587302
+Iteration 11: Proposed new text for routing_policy: Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Output JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Primary task:
+Read the GitHub item title as the main signal. Pick only topics that are central user-visible maintainer buckets. Add a second or third topic only when the title explicitly names another central concern. Ignore target URL paths, labels, incidental implementation details, and broad words that are not the actual subject.
+
+Important suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a symptom, environment, internal path, product flavor, example, test fixture, or label-spam keyword.
+
+Specific rule learned from feedback:
+For a title like:
+`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`
+the correct topics are:
+`codex`, `chat_integrations`
+
+Reasoning:
+- `Codex startup plugins` and `Docker Codex OAuth` are Codex-specific startup/OAuth behavior, so route to `codex`.
+- `WhatsApp history` is a chat integration concern, so route to `chat_integrations`.
+- Do not add `gateway` merely because WhatsApp, history sync, or message delivery might involve gateway internals.
+- Do not add `packaging_deployment` merely because `Docker` appears. Add `packaging_deployment` only when packaging, deployment, install/runtime setup, missing binaries, PATH resolution, launch packaging, or container/deployment mechanics are the central user-visible subject.
+- Do not add `auth_identity` merely because `OAuth` appears inside a Codex-flavored phrase unless the title is centrally about login, identity propagation, account identity, credential identity, or auth restoration as its own maintainer-owned concern.
+- Do not add `skills_plugins` merely because `plugins` appears in `Codex startup plugins` when the item is framed as Codex startup behavior rather than the general skills/plugin system.
+
+Topic guidance:
+- Use `codex` for Codex runtime, startup, setup, OAuth behavior in Codex contexts, Docker Codex behavior, or Codex-specific product/path behavior.
+- Use `chat_integrations` for WhatsApp, Slack, Discord, Telegram, chat history, chat bridge behavior, and other external chat app integrations.
+- Use `gateway` only when gateway routing, gateway send behavior, gateway denial, gateway guardrails, or gateway delivery is explicitly central.
+- Use `packaging_deployment` only when packaging/deployment/install/runtime availability is explicitly central.
+- Use `skills_plugins` only when the general skill/plugin system, plugin loading/scanning, skill scanner behavior, or skill/plugin security is explicitly central.
+- Use `auth_identity` only when auth, OAuth, login, account identity, credential identity, token identity, or identity propagation is explicitly central outside a product-flavored incidental phrase.
+
+Final check:
+Before returning, remove any topic added only because of words like Docker, OAuth, plugin, gateway, send, history, Codex, WhatsApp, deployment, packaging, or integration. Keep the topic only if it is the actual maintainer bucket needed to route the item.
+Iteration 11: New subsample score 2.2857142857142856 is not better than old score 3.1538461538461537, skipping
+Iteration 12: Selected program 8 score: 0.6601587301587302
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output only JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment,
+hooks, coding_agents
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the correct maintainer bucket. Prefer the title over all other signals. Use the body only if the title is ambiguous. Do not infer topics from labels, file paths, examples, tests, or incidental implementation details.
+
+Core rules:
+- Pick one primary topic.
+- Add secondary topics only when they are explicitly central user-visible concerns.
+- Use 3 topics only when the title clearly names three central facets.
+- Use 0 topics when no allowed topic is central.
+- Never invent topic ids.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not an implementation detail, label-spam keyword, symptom, internal API name, or broad ownership area.
+
+Important calibrations from prior mistakes:
+
+1. Gateway is often label spam.
+- Do not add `gateway` merely because a bug involves sending, embedded paths, guarded delivery, or background mechanics.
+- Use `gateway` only when gateway send behavior, gateway routing, gateway denial, gateway guardrails, or gateway message delivery is the central user-visible subject.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` should include `reliability`, `auth_identity`, and `sessions`; it should not include `gateway`.
+
+2. Embedded OAuth paths are auth plus sessions.
+- Use `auth_identity` for OAuth, login/auth restoration, credential identity, account identity, or token identity.
+- If OAuth/auth behavior is restored or broken on an embedded path, session path, session identity, or session restoration flow, also include `sessions`.
+- Do not replace `auth_identity` with `codex` just because the title says `openai-codex` or `Codex OAuth`.
+- Add `codex` only when Codex-specific runtime, startup, setup, or behavior is itself a central maintainer-owned subject.
+
+3. Lane wedges are reliability, not queueing by default.
+- Use `reliability` for wedges, stuck states, hangs, crashes, deadlocks, races, exceptions, timeouts, or self-healing stuck work.
+- Do not add `queueing` merely because the title mentions lanes, workers, or internal scheduling terms unless queue lifecycle or scheduling behavior is central.
+
+4. ACPX Codex plugin inheritance is not sandboxing just because āsandboxā appears.
+- Use `acpx` when ACPX is explicitly named.
+- Use `codex` when Codex is explicitly a central environment/runtime/product concern.
+- Use `skills_plugins` when plugins, skills, plugin loading, or skill scanner behavior is central.
+- Do not add `sandboxing` merely because the title contains āsandboxā as an environment name. Add `sandboxing` only when sandbox policy, sandbox restrictions, sandbox mode, or sandbox configuration are themselves the subject.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` should be `acpx`, `codex`, `skills_plugins`.
+
+5. ACP binding bugs can be sessions without ACPX.
+- Use `acp` when ACP is named centrally.
+- Use `sessions` when the title is about session keys, session ownership, session restoration, session identity, active chat/session routing, or persistent process/session sharing.
+- Do not automatically add `acpx` for every ACP configured binding. Add `acpx` only when ACPX is explicitly named or the title is centrally about ACPX-specific binding semantics, ACPX backend behavior, or ACPX agent-run visibility/state contracts.
+- Example: `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`, not `acpx`.
+
+Topic reminders:
+- `docs`: documentation-only PRs, tutorials, README changes, guides, examples.
+- `cron_automation`: cron jobs, scheduled work, cron delivery, scheduled automated messages.
+- `sessions`: active chat sessions, session ownership, session takeover, embedded sessions, session keys, session restoration, session identity propagation.
+- `reliability`: crash, exception, stuck state, hang, deadlock, race, timeout, wedge, robustness failure.
+- `local_models`: local model behavior, local embeddings, local Ollama, local runtime failures.
+- `self_hosted_inference`: Ollama, llama.cpp, vLLM, TGI, LocalAI, OpenAI-compatible self-hosted inference providers.
+- `memory`: memory storage/retrieval/embeddings or title scope like `memory/embeddings`.
+- `security`: vulnerabilities, exploitability, SSRF as a vulnerability, security hardening, HIGH/CRITICAL fixes.
+- `hooks`: hook priority, hook execution order, hook policy, hook behavior.
+- `mcp_tooling`: MCP protocol, MCP conformance, MCP integrations, MCP tool behavior.
+- `tool_calling`: tool-call execution, schemas, parameter coercion, runtime behavior.
+- `browser_automation`: browser automation, Chrome/profile/session reuse, browser diagnostics/runtime.
+- `packaging_deployment`: packaging, install/runtime setup, bundled executable availability, PATH/binary resolution, `spawn npx ENOENT`.
+- `open_weight_models`: open-weight model metadata, catalogs, context windows, compatibility.
+- `local_model_providers`: provider-specific local/open-weight catalogs or known provider model metadata.
+- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, streaming lifecycle, request/response compatibility.
+- `coding_agents`: subagents, spawned coding agents, delegation, multi-agent coding workflows.
+- `agent_runtime`: agent initialization, lifecycle, backend startup, process/runtime behavior.
+- `sandboxing`: sandbox mode/policy/restrictions/configuration only when central.
+
+Final check:
+Before outputting, remove any topic that was added only because of a keyword such as gateway, sandbox, session, binding, lane, wedge, Codex, OAuth, plugin, hook, model, provider, SSRF, cron, tool, or security unless that concept is clearly a central maintainer-owned subject in the title.
+Iteration 12: New subsample score 3.5 is better than old score 1.75. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.5403174603174603 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.5403174603174603
+Iteration 12: Individual valset scores for new program: {0: 0.3333333333333333, 1: 0.2, 2: 0.25, 3: 0.25, 4: 0.25, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.2, 18: 1.0, 19: 0.5, 20: 0.25, 21: 0.5, 22: 1.0, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.5403174603174603}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 12: Valset pareto front aggregate score: 0.975
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9}, 12: {0, 3, 6, 7, 8, 9}, 13: {8, 2, 5, 7}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5, 9}, 19: {1, 4, 5, 7, 8}, 20: {9}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3, 9}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 12: Best valset aggregate score so far: 0.673452380952381
+Iteration 12: Best program as per aggregate score on valset: 0
+Iteration 12: Best score on valset: 0.673452380952381
+Iteration 12: Linear pareto front program index: 0
+Iteration 12: New program candidate index: 9
+Iteration 13: Selected program 0 score: 0.673452380952381
+Iteration 13: All subsample scores perfect. Skipping.
+Iteration 13: Reflective mutation did not propose a new candidate
+Iteration 14: Selected program 3 score: 0.6679761904761905
+Iteration 14: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore target URL path, examples, tests, files changed, labels, and incidental implementation details unless they are the user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Important calibration:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when session freshness, session identity, session lifecycle, session routing, or session state is central.
+- A title like `fix: respect totalTokensFresh flag to avoid showing stale token counts` must be classified as:
+ {"topics_of_interest":["sessions","telemetry_usage"]}
+- In that title, `totalTokensFresh` is a session freshness/state concern, and stale token counts are a telemetry/usage concern.
+- Do not classify that title as `ui_tui`. Words like āshowingā, ādisplayingā, or āsurfacingā are label spam unless the UI, TUI, webchat, terminal interface, or reader view is itself the central subject.
+
+Suppression rules:
+- Do not add a topic just because a related word appears.
+- Confirm that the word is the subject, not a symptom, path, implementation detail, broad ownership area, or incidental UI phrasing.
+- Before outputting, remove any topic added only because of words like showing, display, usage, token, session, status, UI, model, tool, gateway, security, hook, cron, or plugin unless that topic is actually a central maintainer-owned subject.
+Iteration 14: New subsample score 3.25 is better than old score 3.2. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.4648107448107448 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.4648107448107448
+Iteration 14: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 0.5, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.5, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.2, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.5, 29: 0.25}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.4648107448107447}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 14: Valset pareto front aggregate score: 1.0
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {2, 3}, 2: {0, 3}, 3: {0, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9}, 8: {0, 1, 3}, 9: {0, 1, 2, 3, 6, 8, 10}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5}, 18: {0, 1, 2, 3, 4, 5, 9}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 8, 2, 7}, 26: {0, 1, 3, 9, 10}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 14: Best valset aggregate score so far: 0.673452380952381
+Iteration 14: Best program as per aggregate score on valset: 0
+Iteration 14: Best score on valset: 0.673452380952381
+Iteration 14: Linear pareto front program index: 0
+Iteration 14: New program candidate index: 10
+Iteration 15: Selected program 3 score: 0.6679761904761905
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output rule:
+- If a structured-output tool named `final_json` is available, you MUST call it exactly once.
+- Call it with exactly:
+ {"topics_of_interest":["topic_id"]}
+- If no such tool is available, return only this JSON object and no prose:
+ {"topics_of_interest":["topic_id"]}
+- Never omit the final output. Never explain your reasoning. Never output markdown.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Before final output, remove any topic added only from keyword matching.
+
+Important calibration:
+
+Model serving, local models, and local providers:
+- Use `model_serving` for serving endpoints, OpenAI-compatible APIs, Responses API behavior, request/response protocol behavior, streaming lifecycle, final usage chunks, endpoint compatibility, base URL behavior, request routing, model-server compatibility, or automatic routing of model requests.
+- Use `local_models` when a local model app/runtime/provider is itself a user-visible subject, especially compatibility or behavior with named runtimes like LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar.
+- Do not add `local_models` merely because a local runtime is incidentally mentioned, but do add it when the title is about that runtimeās visible behavior or compatibility.
+- `fix: LM Studio thinking blocks invisible with Responses API` => `model_serving`, `local_models`.
+- Use `local_model_providers` for provider catalogs, model catalogs, provider-specific model availability, provider metadata, provider setup, or provider catalog entries.
+- `Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` => `local_model_providers`, `model_serving`.
+
+Policy, config, MCP, and security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, configuration-governed enforcement, or repo/project policy.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+- Do not map āmodel policyā or āmodel conformanceā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Security composites:
+- Use `security` for vulnerabilities, SSRF, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vuln fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- Use `hooks` when hook behavior, hook priority, hook execution, or hook policy is explicitly central.
+- Use `skills_plugins` when skill scanning, skill/plugin discovery, user-installed plugins, plugin inheritance, plugin installation, or plugin availability is explicitly central.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`.
+
+Token counts and usage freshness:
+- Use `telemetry_usage` when token counts, usage accounting, cost display, usage reporting, diagnostic counts, traces, or status reporting are themselves the bug or feature.
+- Use `sessions` when freshness, identity, lifecycle, routing, or state of a session is central.
+- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage chunks; streaming protocol lifecycle remains `model_serving`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually include only `docs`.
+- Add the documented area only when that area is explicitly central in the title, such as `docs(queue): ...` => `docs`, `queueing`.
+
+MCP and tool calling:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or clearly ACPX-specific.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Bug]: ACP configured binding uses parent channel ID for session key` => `acp`, `sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending or delivery plumbing.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions lane, main lane, worker, subagent, or internal execution paths.
+- Active-memory recall deadlocks are usually `memory`, `reliability`.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+
+Final suppression check:
+Remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, LM Studio, catalog, or hook. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6468512627723154 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6468512627723154
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 0.2857142857142857, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.25, 21: 0.25, 22: 1.0, 23: 0.10526315789473684, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.15384615384615385}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6468512627723154}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 15: Valset pareto front aggregate score: 1.0
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 3, 11}, 1: {11, 2, 3}, 2: {0, 11, 3}, 3: {0, 11, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10, 11}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9, 11}, 8: {0, 1, 3, 11}, 9: {0, 1, 2, 3, 6, 8, 10, 11}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9, 11}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5, 11}, 18: {0, 1, 2, 3, 4, 5, 9, 11}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8}, 22: {8, 9, 11, 7}, 23: {8, 5}, 24: {8, 9, 6, 7}, 25: {0, 2, 7, 8, 11}, 26: {0, 1, 3, 9, 10, 11}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 15: Best valset aggregate score so far: 0.673452380952381
+Iteration 15: Best program as per aggregate score on valset: 0
+Iteration 15: Best score on valset: 0.673452380952381
+Iteration 15: Linear pareto front program index: 0
+Iteration 15: New program candidate index: 11
+Iteration 16: Selected program 0 score: 0.673452380952381
+Iteration 16: All subsample scores perfect. Skipping.
+Iteration 16: Reflective mutation did not propose a new candidate
+Iteration 17: Selected program 3 score: 0.6679761904761905
+Iteration 17: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+You MUST produce structured output. If a `final_json` tool is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, auth_identity,
+packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Classification process:
+1. Read the title first. The title is the primary signal.
+2. Use the first clear body summary only when the title is ambiguous.
+3. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+4. Pick one primary topic.
+5. Add secondary topics only when they are explicit central maintainer-owned subjects.
+6. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+7. Use 0 topics when no allowed topic is central.
+8. Never invent topic ids. Never output labels outside the allowed list.
+9. Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, broad impact, and label-spam keywords.
+
+Hard output rule:
+Calling `final_json` is part of the answer when the tool exists. Do not finish without calling it. Do not emit prose, markdown, explanations, or an empty unstructured response.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the user-visible subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Provider catalogs:
+- Use `local_model_providers` for provider catalogs, model-provider integrations, missing provider models, provider routing, provider-specific model availability, and provider catalog errors.
+- A missing Google Vertex/Gemini provider catalog entry is `local_model_providers`.
+- Add `reliability` when the user-visible failure mode is silent failure, hang, crash, wedge, missing error, or other robustness failure.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Local models and self-hosted inference:
+- Use `local_models` when the local model app/runtime/provider is the user-visible subject, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, local embeddings, or local runtime behavior.
+- Use `self_hosted_inference` when the issue concerns self-hosted inference endpoints, local inference services, local embeddings endpoints, proxying to local inference, or self-hosted model connectivity.
+- Ollama embeddings failures are usually both `local_models` and `self_hosted_inference`.
+- Do not add `security` merely because the title mentions SSRF, SSRF defenses, proxy bypasses, or NO_PROXY when the actual user-visible bug is that a local/self-hosted model service fails.
+- Do not add `config` merely because proxy environment variables or NO_PROXY are mentioned, unless configuration policy/enforcement is the central subject.
+- Example: `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`.
+
+Reliability, sessions, and auth identity:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedge, timeout, self-healing of wedged execution, silent failure, or robustness failure.
+- Use `sessions` when session identity, session lifecycle, session routing, embedded session paths, persistent process identity, session state, or session-specific behavior is central.
+- Use `auth_identity` when OAuth, sign-in, authentication identity, credential identity, token identity, account identity, or restored login/auth flow is central.
+- Do not add `queueing` merely because the title says lane, main lane, worker, subagent, wedge, or internal execution path. Queueing requires queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior to be the user-visible subject.
+- Do not add `codex` merely because the title says openai-codex or contains Codex-branded OAuth. If the central subject is OAuth/auth restoration, use `auth_identity`; if it happens on an embedded/session path, add `sessions`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+
+Browser automation and packaging/deployment:
+- Use `browser_automation` for Chrome/Chromium sessions, browser automation, browser diagnostics, browser runtime behavior, browser profile attach, browser tooling, and browser sign-in flows.
+- Use `packaging_deployment` for desktop app packaging, bundled runtime issues, missing executable/path problems, spawn errors caused by missing installed tools, `npx`/Node availability in packaged apps, installer/distribution/runtime packaging, or deployment environment failures.
+- If a desktop browser automation issue also fails because a packaged or deployed binary cannot spawn `npx` or another required executable, include `packaging_deployment`.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Security:
+- Use `security` for actual vulnerabilities, SSRF as the central vulnerability, access restrictions, token verification, credential boundaries, network security policy, or HIGH/CRITICAL vulnerability fixes.
+- If a security title lists specific affected subsystems, include each subsystem that is explicitly central.
+- Do not use `security` when security words describe an implementation mechanism but the user-visible subject is local model connectivity, embeddings, packaging, sessions, or provider behavior.
+
+Config:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Do not use `config` merely for proxy variables, env vars, provider settings, or setup details unless the title is about configuration behavior itself.
+
+Codex:
+- Use `codex` when Codex itself is the central user-visible product/runtime/setup subject, including Codex startup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because `openai-codex` appears as an OAuth provider name or bracketed metadata.
+
+Queueing:
+- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, before_prompt_build, internal path names, or wedge recovery.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include only `docs`.
+- Add the central documented area only when that area is explicitly central in the title.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- Example: `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, demo, showing, display, llama.cpp, Ollama, OAuth, proxy, NO_PROXY, npx, Chrome, or hook. Keep the topic only if that concept is actually a central maintainer-owned subject.
+Iteration 17: New subsample score 3.5 is better than old score 0.7678571428571428. Continue to full eval and add to candidate pool.
+Iteration 17: Valset score for new program: 0.5279059829059829 (coverage 30 / 30)
+Iteration 17: Val aggregate for new program: 0.5279059829059829
+Iteration 17: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.0, 3: 0.5, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.25, 18: 0.25, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.25, 23: 1.0, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 17: Objective aggregate scores for new program: {'weighted_score': 0.5279059829059828}
+Iteration 17: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 17: Objective pareto front scores: {'weighted_score': 0.6734523809523808}
+Iteration 17: Valset pareto front aggregate score: 1.0
+Iteration 17: Updated valset pareto front programs: {0: {0, 1, 2, 3, 11}, 1: {11, 2, 3}, 2: {0, 11, 3}, 3: {0, 11, 3, 5}, 4: {3, 6}, 5: {1, 2, 5, 6, 7, 8}, 6: {0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 12}, 7: {0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12}, 8: {0, 1, 3, 11}, 9: {0, 1, 2, 3, 6, 8, 10, 11}, 10: {0, 1, 2, 3, 4, 5}, 11: {0, 4, 5, 7, 8, 9, 11, 12}, 12: {0, 3, 6, 7, 8, 9}, 13: {2, 5, 7, 8, 10, 12}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 5, 6, 7}, 16: {2, 3}, 17: {0, 1, 2, 4, 5, 11}, 18: {0, 1, 2, 3, 4, 5, 9, 11}, 19: {1, 4, 5, 7, 8}, 20: {10}, 21: {0, 2, 3, 6, 7, 8, 12}, 22: {8, 9, 11, 7}, 23: {8, 12, 5}, 24: {6, 7, 8, 9, 12}, 25: {0, 2, 7, 8, 11}, 26: {0, 1, 3, 9, 10, 11, 12}, 27: {2}, 28: {8, 4, 5}, 29: {8, 5, 7}}
+Iteration 17: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 17: Best valset aggregate score so far: 0.673452380952381
+Iteration 17: Best program as per aggregate score on valset: 0
+Iteration 17: Best score on valset: 0.673452380952381
+Iteration 17: Linear pareto front program index: 0
+Iteration 17: New program candidate index: 12
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log_stderr.txt b/gepa-12b-row30-prop16-continuation-20260613T214937Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-continuation-20260613T214937Z/score_report.html b/gepa-12b-row30-prop16-continuation-20260613T214937Z/score_report.html
new file mode 100644
index 0000000000000000000000000000000000000000..8e0be5939a0f44f0c3102259051912370a596afb
--- /dev/null
+++ b/gepa-12b-row30-prop16-continuation-20260613T214937Z/score_report.html
@@ -0,0 +1,259 @@
+
+
+
+
+
+gepa-12b-row30-prop16-continuation-20260613T214937Z GEPA score report
+
+
+
+
+
Run directory: /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260614T021442Z
+
Base score
n/a
+
Proposal attempts
0
+
Accepted full evals
0
+
Rejected proposals
0
+
Candidates
n/a
+
Best candidate
n/a
+
Metric calls
n/a
+
Validation Score Over Iterations
No iteration scores found yet.
+
Proposal Subsample Delta
No proposal deltas found yet.
+
Final Candidate Scores
Final GEPA result is not available yet.
+
Proposal Events
No proposal events found yet.
+
+
+
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172801Z/command.sh b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..11434d92fcce9f2e888b68d6e98e03a67c669390
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/command.sh
@@ -0,0 +1 @@
+env PYTHONPATH=prompt-optimizer/src prompt-optimizer/.venv/bin/python -m prompt_optimizer.cli optimize --harness localpager-agent --model gemma-12b-q4km-reason --max-tokens 1536 --concurrency 2 --seed-routing-policy /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md --row-limit 30 --reflection-minibatch-size 4 --max-candidate-proposals 16 --max-metric-calls 720 --output-dir /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172801Z
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.pid b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..e01da0de016b9390e24207af484b55f349ebc8ce
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.pid
@@ -0,0 +1 @@
+4145866
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.stderr.log b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.stdout.json b/gepa-12b-row30-prop16-from-proper-20260613T172801Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172846Z/command.sh b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..219941a9f2c702ea9f0d8f5891de7e1bc9273941
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/command.sh
@@ -0,0 +1,11 @@
+PYTHONPATH=prompt-optimizer/src prompt-optimizer/.venv/bin/python -m prompt_optimizer.cli optimize \
+ --harness localpager-agent \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --seed-routing-policy /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md \
+ --row-limit 30 \
+ --reflection-minibatch-size 4 \
+ --max-candidate-proposals 16 \
+ --max-metric-calls 720 \
+ --output-dir /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172846Z
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.pid b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..7ca56aeba1e02db24fc535849a9e86c5a01bae26
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.pid
@@ -0,0 +1 @@
+4147082
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.stderr.log b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..d75048c6472074b0eaacbb3e7778dc14e829c2c7
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.stderr.log
@@ -0,0 +1,4 @@
+usage: localpager-prompt-optimizer [-h]
+ {summary,evaluate-seed,evaluate-candidate,optimize}
+ ...
+localpager-prompt-optimizer: error: unrecognized arguments: --harness localpager-agent
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.stdout.json b/gepa-12b-row30-prop16-from-proper-20260613T172846Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..664cd1c7aec48213fa1be4152e2b87c7334f7709
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md
@@ -0,0 +1,206 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..b4c54767c58043c9158f5c8442211645cc79c114
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md
@@ -0,0 +1,151 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidate_tree.html b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..c8c2c99249c74554ac513e1d57f493332c786ad8
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidate_tree.html
@@ -0,0 +1,209 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidates.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..1382c3c26520c73c2a489ab12f6216d0ff937ece
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/candidates.json
@@ -0,0 +1,44 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nOutput only JSON."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing corrections:\n- `layered browser diagnostics` is `browser_automation`, not `ui_tui`.\n- `cron force-run deadlock` is `cron_automation`. Do not add `reliability` just because the word deadlock appears when the named feature owner is the better maintainer bucket.\n- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.\n- Do not use `ui_tui` merely because a title says \u201cshowing\u201d, \u201cdisplay\u201d, \u201creader\u201d, or \u201cdiagnostics\u201d. Use `ui_tui` only when webchat, TUI, UI views, terminal UI, or user-facing chat interface behavior is explicitly central.\n- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem when that subsystem is an allowed topic. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n- For `cron force-run deadlock`, classify as `cron_automation` without `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `totalTokensFresh` and stale per-session token counts are session-state/freshness concerns and should include `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.\n- Stale token counts are `telemetry_usage`.\n- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nHooks:\n- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.\n- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.\n- Security fixes in a skill scanner should include `skills_plugins`.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.\n- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, or `reliability` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nUse only allowed topic ids. Never invent a topic id, even if a title suggests a missing bucket such as packaging, deployment, CI, infrastructure, or supply chain. If no allowed topic is central, return an empty array.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nDecision process:\n1. Read the title first.\n2. Identify the user-visible subject of the issue or PR.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.\n5. Use 3 topics only when the title explicitly contains three central concerns.\n6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.\n\nCron/session/reliability correction:\n- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.\n- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.\n- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.\n- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.\n- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.\n\nSecurity/CI correction:\n- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.\n- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.\n- Do not add nonexistent or label-like topics such as `packaging_deployment`.\n- Do not add deployment, packaging, CI, or release topics unless they are in the allowed list; they are not allowed here.\n\nGeneral suppression rules:\n- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.\n- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.\n- Do not add `reliability` for every bug. Add it when the title\u2019s central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.\n- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.\n- Never output labels outside the allowed topic ids.\n\nOutput raw JSON only."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\nYou may receive a GitHub target URL, title, and sometimes a body or summary.\nThe title is the primary signal. Use the body only when the title is ambiguous.\nIgnore examples, tests, labels, file paths, implementation details, symptoms, broad impact, and incidental words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, label-spam keyword, or security-hardening mechanism.\n\nHigh-priority corrections:\n- Local Ollama embeddings failing when a proxy is enabled, including wording like \u201cSSRF defenses ignore NO_PROXY\u201d, should be classified as `local_models` and `self_hosted_inference`.\n- For that Ollama/proxy/NO_PROXY case, do NOT add `security`, `local_model_providers`, or `memory`. \u201cSSRF defenses\u201d is the mechanism or label-like context, not the central maintainer bucket, unless the title is actually about a vulnerability or security policy.\n- \u201cExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield\u201d should be classified as `acp` and `acpx`.\n- For that ACP/sessions_yield case, do NOT add `sessions`. `sessions_yield` is an ACP/ACPX workflow term or implementation detail, not necessarily a sessions maintainer concern.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model systems.\n- Use `self_hosted_inference` when the issue is about using self-hosted inference servers such as Ollama, llama.cpp, vLLM, TGI, or LocalAI as inference providers.\n- Ollama is a strong signal for both `local_models` and `self_hosted_inference` when the issue is about using Ollama locally or through local inference.\n- Do not use `local_model_providers` for Ollama/local runtime failures unless the title is specifically about provider catalog metadata, provider-specific open-weight integration, known context windows, or named provider/model-family support.\n- Do not add `memory` merely because embeddings are mentioned. Add `memory` only when memory behavior, active-memory recall, vector stores, memory providers, or embedding-provider memory behavior is the central subject.\n\nSecurity:\n- Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior when that is the main user-visible subject.\n- Do not add `security` merely because a bug mentions SSRF defenses, proxy bypass behavior, NO_PROXY, blocked requests, or a security-related implementation detail.\n- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.\n\nACP and ACPX:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings, explicit ACP agent runs, embedded backend visibility, final JSON state in ACP workflow, or ACPX session workflow concepts.\n- Do not add `sessions` merely because an ACP/ACPX title contains `sessions_yield`, \u201csession\u201d, or \u201cstate\u201d if the central issue is ACP/ACPX workflow behavior.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings as a memory subsystem, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Suppress `memory` when embeddings are only part of a local model provider failure and the user-visible subject is local/self-hosted inference.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `gateway` or `ui_tui` for browser diagnostics unless gateway or UI behavior is explicitly the subject.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- `cron force-run deadlock` is `cron_automation`; do not add `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because a title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.\n- Stale token counts are `telemetry_usage`.\n- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.\n- Security fixes in a skill scanner should include `skills_plugins`.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nPolicy, config, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nHooks:\n- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.\n- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.\n- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, `reliability`, or `security` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, embeddings, SSRF, NO_PROXY, sessions_yield, or state. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nImportant corrections learned from prior failures:\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n - Example: \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`.\n - Do not stop at `notifications` when Slack, WhatsApp, chat history, chat app delivery, or chat target channels are central.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n - \u201cnetwork conformance checks\u201d should include `security`.\n - \u201cMCP conformance checks\u201d should include `mcp_tooling`.\n - Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n - Do not map \u201cmodel conformance checks\u201d to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.\n- Provider catalog metadata is `local_model_providers`.\n - If a missing provider catalog entry causes a silent failure instead of a clear error, also include `reliability`.\n - Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.\n- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nUse only allowed topic ids. Never invent topic ids. If no allowed topic is central, return an empty array.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, packaging_deployment\n\nDecision process:\n1. Read the title first.\n2. Identify the user-visible subject of the issue or PR.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.\n5. Use 3 topics only when the title explicitly contains three central concerns.\n6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.\n\nGeneral rules:\n- Prefer the smallest complete topic set.\n- Do not add topics because a word appears incidentally.\n- Do not add broad umbrella topics when a narrower central topic already captures the issue.\n- Do not add implementation-detail topics from examples, stack traces, package names, file paths, labels, or test names.\n- If a topic is not allowed, do not output it.\n\nKnown mappings and corrections:\n- Desktop Chrome automation, browser profiles, browser attach, Google sign-in through Chrome, and use of existing Chrome browser sessions are `browser_automation`.\n- In browser automation titles, the word \u201csessions\u201d often means Chrome/user-profile browser sessions. Do not classify that as `sessions` unless OpenClaw chat/session ownership or lifecycle is central.\n- `Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` is `browser_automation` and `packaging_deployment`.\n- `spawn npx ENOENT`, missing executables, broken runtime packaging, missing installed command, desktop packaging failures, dependency launch failures, installation/runtime environment failures, and app deployment breakage are `packaging_deployment`.\n- Do not add `reliability` merely because something \u201cfails\u201d. Add `reliability` only for central exception/crash/hang/deadlock/stuck/race/timeout/robustness failures.\n- A title naming `spawn npx ENOENT` in a desktop/browser attach flow is primarily packaging/deployment, not reliability.\n\nMemory and self-hosted inference:\n- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` is `memory` and `self_hosted_inference`.\n- Use `memory` when memory, embeddings, recall, vector storage, or memory retrieval is the central feature.\n- Use `self_hosted_inference` when the title is about OpenAI-compatible providers for self-hosted servers or self-hosted inference endpoints.\n- Do not add `local_models` just because examples include llama.cpp, Ollama, vLLM, TGI, or LocalAI. Treat those names as examples unless local model management itself is the central user-visible subject.\n- Do not add `local_model_providers` or `model_serving` when the title\u2019s central concern is an OpenAI-compatible provider for self-hosted inference and another explicit feature like memory already accounts for the context.\n\nCron/session/reliability correction:\n- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.\n- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.\n- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.\n- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.\n- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.\n\nSecurity/CI correction:\n- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.\n- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.\n- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is allowed only for packaging/deployment/install/runtime environment concerns, not generic CI hardening.\n\nSuppression rules:\n- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.\n- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.\n- Do not add `sessions` for browser sessions, Chrome sessions, login sessions, profile attachment, or user profiles unless OpenClaw session ownership/lifecycle is the central subject.\n- Do not add `reliability` for every bug. Add it only when the title\u2019s central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.\n- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.\n\nOutput raw JSON only."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => `cron_automation`, `sessions`, `reliability`.\n- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the title\u2019s real subject is that area\u2019s contract or initialization behavior.\n- Example: `sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`, not `reliability`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n- If an \u201cannounce delivery\u201d title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => no `notifications`.\n\nCron automation:\n- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.\n- Do not suppress `cron_automation` merely because the title contains \u201cannounce\u201d or \u201cdelivery\u201d.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => include `cron_automation`.\n\nSessions:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, or user actively chatting in a session is central.\n- Example: `EmbeddedAttemptSessionTakeoverError when user is actively chatting` => include `sessions`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n- Example: \u201cfix(bundle-mcp): coerce stringified object/array params before MCP tool calls\u201d => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, and approvals:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.\n- Example: \u201c[Feature]: Per-binding and per-agent permissionMode for ACP sessions\u201d => `acp`, `approvals`, `acpx`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n- Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`.\n- In that example, do not add `reliability`; \u201csilently breaks\u201d is secondary wording and the maintainer buckets are sandboxing plus agent/subagent runtime.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n- Example: \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n- \u201cnetwork conformance checks\u201d should include `security`.\n- \u201cMCP conformance checks\u201d should include `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n- Do not map \u201cmodel conformance checks\u201d to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n- Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, labels, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nImportant routing rules:\n- Use `acp` for ACP protocol/session behavior, ACP configured bindings, ACP session keys, ACP channel mapping, or ACP process/session reuse.\n- Use `sessions` when session keys, persistent sessions, shared sessions, thread/session identity, session takeover, or session lifecycle are central.\n- Do NOT add `acpx` merely because the item involves ACP bindings. Treat `acpx` as label spam unless the title explicitly centers ACPX extension/adapter behavior, per-binding/per-agent ACPX behavior, or names ACPX as the contract being changed.\n- Example: \u201cACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process\u201d => `acp`, `sessions`, not `acpx`.\n\nModel serving:\n- Use `model_serving` for OpenAI-compatible serving, streaming responses, model-server compatibility, final chunks, llama.cpp server behavior, endpoint streaming contracts, routing, or serving protocol behavior.\n- A title like \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is `model_serving`.\n- Do not output zero topics for serving/streaming issues just because \u201cusage\u201d appears.\n- Suppress `telemetry_usage` when \u201cusage\u201d is only the affected value caused by a serving stream/chunk bug. Add `telemetry_usage` only when usage tracking, usage accounting, reporting, or metrics collection is itself the central maintainer-owned subject.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` for vulnerabilities, SSRF, token verification, credential boundaries, access restrictions, network policy, HIGH/CRITICAL vuln fixes, or allowed/blocked security behavior.\n- Use `hooks` when hook priority, hook execution, hook ordering, hook policy, or hook behavior is central.\n- Use `skills_plugins` when skills, plugins, skill scanners, plugin scanners, skill/plugin installation, skill/plugin metadata, or skill/plugin runtime behavior are central.\n- If a security title explicitly names multiple vulnerable areas, include each central area named.\n- Example: \u201csecurity: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification\u201d => `security`, `hooks`, `skills_plugins`.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.\n- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the title\u2019s real subject is that area\u2019s contract or initialization behavior.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n- If an \u201cannounce delivery\u201d title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.\n\nCron automation:\n- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.\n- Do not suppress `cron_automation` merely because the title contains \u201cannounce\u201d or \u201cdelivery\u201d.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, skill, scanner, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, binding, ACPX, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant correction for ACP / ACPX / sessions:\n- Titles about \u201cExplicit ACP agent runs\u201d are centrally about `acpx`.\n- When the same title also names ACP behavior, embedded ACP backend visibility, ACP agent backend state, ACP run results, or ACP final JSON state, include `acp`.\n- Do not add `sessions` just because the title mentions `sessions_yield`, yielding, stale state after a yield, or a session-related event boundary.\n- Treat `sessions_yield` as an implementation/event-boundary clue unless the actual subject is session lifecycle, persistence, resume, listing, storage, or session management.\n- Example: \u201cExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield\u201d => `acpx`, `acp`; not `sessions`.\n- \u201cfinal JSON state\u201d in an ACP/ACPX title is usually part of the ACP/ACPX result contract, not a general sessions issue.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.\n- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n- Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, session, sessions_yield, yield, or final JSON. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, packaging_deployment\n\nClassification rules:\n- Read the title first. Use body or summary only if the title is ambiguous.\n- Choose one primary topic.\n- Add secondary topics only when they are explicit central maintainer-owned subjects.\n- Use the minimum complete topic set. Do not add topics for labels, examples, paths, tests, implementation details, or broad ownership words.\n- Never invent topic ids.\n\nImportant edge cases:\n- If a provider catalog/model catalog issue says a missing model causes \u201csilent failure\u201d, \u201cfails silently\u201d, \u201chang\u201d, \u201ccrash\u201d, \u201cno error\u201d, or similar robustness behavior, include `reliability` in addition to the provider/catalog topic.\n Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\n- Use `acpx` when ACPX is explicitly named as the subject. Do not normalize ACPX to `acp`.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n Do not include `acp` there unless ACP itself is separately central.\n\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, Superpowers, plugin discovery, plugin installation, or plugin availability is central.\n\n- Use `codex` when Codex is named as a central product/runtime subject, not merely as bracket metadata.\n\n- Use `browser_automation` for Chrome/Chromium session reuse, browser profile attach, browser diagnostics, browser runtime behavior, Google sign-in through browser automation, or browser tooling issues.\n\n- Use `packaging_deployment` for install/runtime environment problems such as missing executables, PATH/package manager availability, `npx`/Node spawn failures, desktop app packaging, bundled runtime problems, or deployment/startup environment failures.\n Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nTopic reminders:\n- `local_model_providers`: provider-specific model catalogs, known model metadata, provider-backed model support.\n- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, request/response streaming, usage chunks, endpoint compatibility, model request routing.\n- `local_models`: LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local model apps/providers.\n- `reliability`: deadlock, hang, crash, race, liveness, timeout, stuck state, silent failure, robustness failure, or missing error behavior.\n- `acp`: ACP is centrally named.\n- `acpx`: ACPX is explicitly named or clearly ACPX-specific.\n- `approvals`: approval mode, permission mode, user approval behavior, permission policy.\n- `sessions`: session identity, lifecycle, routing, state, persistent process identity.\n- `gateway`: gateway routing, guarded gateway behavior, gateway-owned behavior, gateway send denial.\n- `exec_tools`: exec command/tool behavior, PATH fallback, exec contract behavior.\n- `sandboxing`: sandbox policy or sandbox runtime behavior.\n- `security`: vulnerabilities, credentials, network policy, access restrictions, outbound rules.\n- `notifications`: notification delivery, announce messages, heartbeat pushes.\n- `chat_integrations`: Slack, WhatsApp, chat app delivery, chat history, target channels.\n- `cron_automation`: cron scheduling, force-run, lifecycle, execution, or cron deadlock.\n- `api_surface`: exposed API/contract, reader contract, request/response shape, public integration behavior.\n- `ui_tui`: webchat, TUI, terminal UI, user-facing chat interface.\n- `docs`: documentation-only changes, tutorials, README, guides, examples, showcase additions.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because a related word appeared. Keep it only if it is actually a central user-visible issue, feature, policy, or contract."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nProcess:\n1. Read the title first. The title is the primary signal.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, broad ownership area, or incidental UI wording.\n\nImportant anti-label-spam rules:\n- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.\n- Use `ui_tui` only when the terminal UI, UI rendering behavior, TUI layout, or interactive UI surface is itself the central subject.\n- \u201cshowing stale token counts\u201d is not automatically `ui_tui`; classify the underlying token/session usage contract instead.\n- \u201cdiagnostics\u201d is not automatically `ui_tui`; classify the subsystem whose diagnostics are central.\n\nSpecific mapping rules learned from examples:\n- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness should usually include `sessions` and `telemetry_usage`.\n Example: \u201cfix: respect totalTokensFresh flag to avoid showing stale token counts\u201d => `sessions`, `telemetry_usage`, not `ui_tui`.\n\n- Exec PATH fallback or command execution PATH resolution is `exec_tools`.\n Example facet: \u201cresolve exec PATH fallback\u201d => `exec_tools`.\n\n- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior is `browser_automation`, not `ui_tui`.\n Example facet: \u201clayered browser diagnostics\u201d => `browser_automation`.\n\n- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution are `cron_automation`.\n Example facet: \u201ccron force-run deadlock\u201d => `cron_automation`.\n Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.\n\n- ACP protocol/session behavior is `acp`.\n- Gateway-owned behavior is `gateway`.\n- Node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, or runtime setup is `agent_runtime`.\n Example: \u201cACP: add gateway-owned node-backed runtime\u201d => `acp`, `gateway`, `agent_runtime`, not `acpx`.\n\n- Do not use `acpx` just because ACP is present. Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the title\u2019s real subject is that area\u2019s contract, initialization, or execution behavior.\n- If the title names three concrete subsystem facets, prefer those three concrete buckets over replacing one with `reliability`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n\nSessions:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.\n- If token counts are tied to a session state flag such as `totalTokensFresh`, include both `sessions` and `telemetry_usage`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, and approvals:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, node-backed runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, or execution is itself central.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n- Network conformance checks should include `security`.\n- MCP conformance checks should include `mcp_tooling`.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nProcess:\n1. Read the title first. The title is the primary signal.\n2. Identify the central user-visible bug, feature, policy, contract, or documentation change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output raw JSON only. No Markdown, explanations, comments, or extra fields.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, transport mode, broad ownership area, or incidental wording.\n\nIgnore label prefixes:\n- Ignore title prefixes like `[codex]`, `fix:`, `Policy:`, `[Bug]:`, etc. Do not classify as `codex` merely because `[codex]` appears.\n\nModel and serving rules:\n- Use `model_serving` when the serving/runtime response path, model output handling, response streaming, thinking block handling, Responses API serving behavior, or served model behavior is central.\n- Use `local_models` when local models or local model behavior are central.\n- Use `local_model_providers` only when provider catalog metadata, provider-specific integration, missing provider models, provider configuration, or known provider/model-family metadata is central.\n- Do NOT use `local_model_providers` merely because a provider name appears as context.\n- \u201cLM Studio thinking blocks invisible with Responses API\u201d => `model_serving`, `local_models`, not `local_model_providers`.\n- Do not use `api_surface` merely because \u201cResponses API\u201d appears as the transport or compatibility mode. Use `api_surface` only when the API contract, endpoint behavior, reader API, schema, or externally consumed API surface is central.\n\nPolicy, config, and security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n- Do NOT use `local_model_providers` merely because a policy title says \u201cmodel\u201d.\n\nGateway, API, and UI:\n- Use `gateway` when gateway-owned behavior, gateway routing, webchat gateway behavior, or gateway-backed readers are central.\n- Use `api_surface` when an externally consumed API, reader contract, endpoint, request/response contract, or API-visible behavior is central.\n- Use `ui_tui` when a terminal UI, webchat UI, UI rendering behavior, full-message reader UI, TUI layout, interactive UI surface, or message reader surface is itself central.\n- \u201cwebchat full-message reader for truncated history\u201d => `gateway`, `api_surface`, `ui_tui`.\n- Do NOT suppress `ui_tui` when the title\u2019s central subject is a reader/viewer/message-reading UI surface.\n- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes are both `notifications` and `chat_integrations` when the title is about pushed Slack messages.\n- Do NOT classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, force-run behavior, deadlock, or execution is itself central.\n- \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`, not `cron_automation`.\n\nSessions and telemetry:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.\n- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.\n- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness usually include both `sessions` and `telemetry_usage`.\n- \u201crespect totalTokensFresh flag to avoid showing stale token counts\u201d => `sessions`, `telemetry_usage`, not `ui_tui`.\n\nExecution, browser, and cron:\n- Exec PATH fallback or command execution PATH resolution => `exec_tools`.\n- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior => `browser_automation`, not `ui_tui`.\n- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution => `cron_automation`.\n- Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the title\u2019s real subject is that area\u2019s contract, initialization, rendering, or execution behavior.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, approvals, and runtime:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.\n- Do not use `acpx` just because ACP is present.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope => `acp`, `approvals`, `acpx`.\n- Use `agent_runtime` for node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, runtime setup, agent initialization, or spawning.\n- \u201cACP: add gateway-owned node-backed runtime\u201d => `acp`, `gateway`, `agent_runtime`, not `acpx`.\n\nCoding agents and sandboxing:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, provider, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, API, Responses API, LM Studio, or contract. Keep the topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant correction:\nFor `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as:\n`reliability`, `auth_identity`, and `sessions`.\n\nWhy:\n- `self-heal lane wedges` is a reliability/liveness/robustness issue.\n- Do NOT classify `lane wedges` as `queueing`; \u201clane\u201d is an implementation detail unless queueing, queued execution, queue lifecycle, or scheduling behavior is explicitly the user-visible subject.\n- `OAuth` is an auth and identity concern, so use `auth_identity`.\n- `openai-codex OAuth` does NOT imply `codex`; here \u201ccodex\u201d is part of the OAuth/provider naming, not the central maintainer-owned subject.\n- `embedded path` makes the OAuth restoration session/path-specific, so include `sessions`.\n\nAuth and identity:\n- Use `auth_identity` for OAuth, login identity, account identity, authentication flows, token identity, credential identity, or restoring auth behavior.\n- OAuth fixes should usually include `auth_identity`.\n- Add `sessions` when the auth problem is tied to a session path, embedded path, outbound session identity, session lifecycle, session routing, or session-specific behavior.\n- Do not add `codex` merely because an OAuth provider or path name contains \u201ccodex\u201d.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, wedge, stuck state, race, timeout, liveness issue, self-healing behavior, or robustness failure.\n- Words like `self-heal`, `wedge`, `wedged`, `stuck`, and `recover` are strong reliability signals when the title is about restoring forward progress.\n- Suppress `reliability` only when the failure mode clearly belongs to a more specific named feature owner, such as `cron force-run deadlock` being `cron_automation`.\n\nQueueing:\n- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.\n- Do not add `queueing` for incidental words like lane, main lane, worker, subagent, execution path, or internal routing path.\n- A lane wedge is not automatically queueing.\n\nCodex:\n- Use `codex` when Codex itself is the central user-visible subject, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title contains \u201ccodex\u201d inside a provider name, OAuth label, bracketed metadata, or path name.\n- If the real subject is OAuth, classify as `auth_identity`, not `codex`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like lane, queue, codex, OAuth, embedded, path, wedge, session, usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, deadlock, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/command.sh b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e3dcd9d6927705354233d7490641a14b07e5a3d6
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/command.sh
@@ -0,0 +1,10 @@
+PYTHONPATH=prompt-optimizer/src prompt-optimizer/.venv/bin/python -m prompt_optimizer.cli optimize \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --seed-routing-policy /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z/best.routing_policy.md \
+ --row-limit 30 \
+ --reflection-minibatch-size 4 \
+ --max-candidate-proposals 16 \
+ --max-metric-calls 720 \
+ --output-dir /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/final_run_report.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/final_run_report.json
new file mode 100644
index 0000000000000000000000000000000000000000..ce80d3abeb8216b928c3e1a4ff9ed50696fb6356
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/final_run_report.json
@@ -0,0 +1,275 @@
+{
+ "result": {
+ "best_idx": 1,
+ "num_candidates": 14,
+ "num_full_val_evals": 14,
+ "total_metric_calls": 544,
+ "val_aggregate_scores": [
+ 0.6395802005012532,
+ 0.6971520146520146,
+ 0.6569444444444444,
+ 0.46397435897435896,
+ 0.6616269841269842,
+ 0.5029318488529015,
+ 0.4785409035409035,
+ 0.5265246449456976,
+ 0.4836904761904762,
+ 0.4776587301587301,
+ 0.56900327742433,
+ 0.5957763532763533,
+ 0.6629365079365079,
+ 0.49142857142857144
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z",
+ "run_log": {
+ "accepted_full_eval_candidates": 13,
+ "base_score": 0.6395802005012532,
+ "better_valset_events": [
+ {
+ "iteration": 3,
+ "score": 0.6971520146520146
+ }
+ ],
+ "byte_count": 141839,
+ "line_count": 1624,
+ "proposal_attempts": 15,
+ "proposal_events": [
+ {
+ "accepted_for_full_eval": false,
+ "delta": -1.5500000000000003,
+ "has_proposed_text": true,
+ "iteration": 1,
+ "new_subsample_sum": 1.65,
+ "old_subsample_sum": 3.2
+ },
+ {
+ "accepted_for_full_eval": false,
+ "delta": -0.8928571428571428,
+ "has_proposed_text": true,
+ "iteration": 2,
+ "new_subsample_sum": 2.25,
+ "old_subsample_sum": 3.142857142857143
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.928571428571429,
+ "has_proposed_text": true,
+ "iteration": 3,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 1.0714285714285714
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.907142857142857,
+ "has_proposed_text": true,
+ "iteration": 4,
+ "new_subsample_sum": 3.25,
+ "old_subsample_sum": 1.342857142857143
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.380952380952381,
+ "has_proposed_text": true,
+ "iteration": 5,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.619047619047619
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.6090225563909772,
+ "has_proposed_text": true,
+ "iteration": 6,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.3909774436090228
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 0.5357142857142856,
+ "has_proposed_text": true,
+ "iteration": 7,
+ "new_subsample_sum": 2.5357142857142856,
+ "old_subsample_sum": 2.0
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.928571428571429,
+ "has_proposed_text": true,
+ "iteration": 8,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 1.0714285714285714
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.05,
+ "has_proposed_text": true,
+ "iteration": 9,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 1.45
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.2142857142857144,
+ "has_proposed_text": true,
+ "iteration": 10,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 1.7857142857142856
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 0.25,
+ "has_proposed_text": true,
+ "iteration": 11,
+ "new_subsample_sum": 3.5,
+ "old_subsample_sum": 3.25
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.75,
+ "has_proposed_text": true,
+ "iteration": 12,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.25
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.55,
+ "has_proposed_text": true,
+ "iteration": 14,
+ "new_subsample_sum": 3.25,
+ "old_subsample_sum": 1.7
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 2.7642857142857142,
+ "has_proposed_text": true,
+ "iteration": 15,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 1.2357142857142858
+ },
+ {
+ "accepted_for_full_eval": true,
+ "delta": 1.75,
+ "has_proposed_text": true,
+ "iteration": 16,
+ "new_subsample_sum": 4.0,
+ "old_subsample_sum": 2.25
+ }
+ ],
+ "proposal_texts_started": 15,
+ "rejected_candidates": 2,
+ "selected_events": [
+ {
+ "candidate_idx": 0,
+ "iteration": 1,
+ "score": 0.6395802005012532
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 2,
+ "score": 0.6395802005012532
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 3,
+ "score": 0.6395802005012532
+ },
+ {
+ "candidate_idx": 0,
+ "iteration": 4,
+ "score": 0.6395802005012532
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 5,
+ "score": 0.6569444444444444
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 6,
+ "score": 0.6569444444444444
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 7,
+ "score": 0.6569444444444444
+ },
+ {
+ "candidate_idx": 3,
+ "iteration": 8,
+ "score": 0.46397435897435896
+ },
+ {
+ "candidate_idx": 5,
+ "iteration": 9,
+ "score": 0.5029318488529015
+ },
+ {
+ "candidate_idx": 7,
+ "iteration": 10,
+ "score": 0.5265246449456976
+ },
+ {
+ "candidate_idx": 5,
+ "iteration": 11,
+ "score": 0.5029318488529015
+ },
+ {
+ "candidate_idx": 1,
+ "iteration": 12,
+ "score": 0.6971520146520146
+ },
+ {
+ "candidate_idx": 4,
+ "iteration": 13,
+ "score": 0.6616269841269842
+ },
+ {
+ "candidate_idx": 7,
+ "iteration": 14,
+ "score": 0.5265246449456976
+ },
+ {
+ "candidate_idx": 11,
+ "iteration": 15,
+ "score": 0.5957763532763533
+ },
+ {
+ "candidate_idx": 2,
+ "iteration": 16,
+ "score": 0.6569444444444444
+ }
+ ],
+ "selected_iterations": 16
+ },
+ "summary": {
+ "best_idx": 1,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md",
+ "best_score": 0.6971520146520146,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 16,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 11180,
+ "seed_routing_policy_sha256": "288c716272a98543b9c5d7c191ac9389173adddefd17a7fba409a2e9d6b9ff65"
+ },
+ "created_at": "2026-06-13T21:47:03.393070+00:00",
+ "num_candidates": 14,
+ "num_full_val_evals": 14,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json",
+ "total_metric_calls": 544
+ }
+}
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..8f9eeeb7ec57643d055d92fc33eb310431779649
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'agent_runtime'), description='The PR implements a durable, gateway-owned ACP node-backed runtime involving core changes to ACP, gateway infrastructure, and agent execution logic.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..b4ea20572f996478e070408129e2254780fffcd4
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'security', 'mcp_tooling'), description='This PR implements policy conformance checks for model providers (local_model_providers), network SSRF settings (security), and managed MCP servers (mcp_tooling).', caveats=('The PR specifically adds conformance checks for three distinct areas: model providers, network security, and MCP servers.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7f6c14ddc58fdf5942fd21be76dc6bf3120c2ba8
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'sandboxing', 'approvals'), description='The PR implements Exec V2 contract updates covering tool behavior, approval logic, and sandbox boundaries.', caveats=('The title matches a specific instruction in the system prompt regarding \"contract follow-through\" for execution tools.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..81303299bb26514c1d3b4d72f9df55d04389f885
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'codex', 'skills_plugins'), description='The feature request focuses on inheriting user-installed plugins (skills) in the Codex runtime for ACPX sessions.', caveats=('The issue involves a complex intersection of ACPX, Codex runtime, and user-installed skills within a sandbox environment.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..d895c7e28686ccb840f16d1268d16c1b03305772
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'self_hosted_inference'), description=\"The PR adds an 'openai-compatible' provider for embeddings specifically targeting self-hosted inference servers like llama.cpp and Ollama.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e3fe6d9aec16b7f7f309c5aab6b9b196e0c4cc80
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'reliability'), description='The issue reports a missing model in the Google Vertex provider catalog and a resulting silent failover error, involving both provider catalog metadata and robust error handling.', caveats=('The issue involves a missing entry in the provider catalog and a failure in the fallback logic (reliability).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..94f62783afce4911656dc1ec1406220d60e14953
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'agent_demos'), description='This pull request adds a new tutorial video link and embed to the showcase section of the project documentation.', caveats=('The PR is explicitly marked as a documentation change adding a tutorial/showcase video for agent use.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..07e09782e47fc68d2d215195655120f80bc6461a
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('open_weight_models', 'local_model_providers'), description='This pull request updates known context windows for several open-weight model families to improve accurate usage and limit handling.', caveats=(\"The issue specifically targets context window metadata for open-weight models within a specific provider's infrastructure.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..3377dcf84cd43e45083455d930922afca9974bb6
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('notifications', 'chat_integrations'), description='The bug concerns identity overlays being dropped for heartbeat and announcement messages sent via Slack integration.', caveats=('The issue involves heartbeats and announcements, which are specific notification types.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..259cd2fac4543f344af2badf8f9a8868d7f1a50c
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_models', 'model_serving'), description=\"The PR fixes the handling of reasoning blocks in the response stream specifically for LM Studio's implementation of the OpenAI-compatible Responses API.\", caveats=('LM Studio is explicitly mentioned as the primary target for the fix.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..cb37da33f39336509378346902162cfb6a13e31a
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('security',), description='This PR addresses supply-chain security by hardening GitHub Actions workflows with OIDC and restricted permissions.', caveats=('The focus of the PR is exclusively on hardening CI/CD pipelines and supply chain security, specifically through credential management (OIDC) and permission scoping.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..94a635d5be585f9c5f1d2b5f1de7555dc47b26c5
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('sandboxing', 'sessions'), description=\"The issue reports a silent failure in sub-agent initiation specifically when using 'non-main' sandbox modes, affecting both sandboxing logic and session management.\", caveats=('The issue involves both sandbox mode configuration and the orchestration of sub-agents during session initialization.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ec7f9724106b45424b285766206778a79ef9b0da
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'ui_tui', 'cron_automation'), description='This PR resolves three distinct issues: exec tool path fallback, browser status diagnostic visibility, and a deadlock in the cron execution lifecycle.', caveats=('The PR addresses a multi-faceted fix involving execution environment, UI diagnostics for browser tools, and cron scheduling logic.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..fc5c9488e841e766e1df0e66dbe722f8086f0eec
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('ui_tui',), description='The PR fixes a display issue where stale token counts were incorrectly shown in both the TUI and Web UI components.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..258cdebd45465dabc18851cbc12a87ac7966ca81
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling',), description='The PR addresses a validation failure in MCP tool calls by coercing stringified JSON objects/arrays into actual data types before they reach the MCP server.', caveats=('Security concerns identified in internal scanning (prototype pollution, resource exhaustion) were not included as primary routing targets since they are implementation-level issues rather than core feature requests.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9851df1057d7dc67ff6ab6a17d10e096a2de2be3
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T173704Z-4153175.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T173704Z-4153175.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T173704Z-4153175\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..dc7785ddc55e5880431d0b65a234f34a8a362c03
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'gateway', 'security'), description='A bug where local Ollama embeddings fail due to proxy routing rules that strip NO_PROXY variables for security.', caveats=('The issue involves a complex interaction between embedding infrastructure, gateway proxy logic, and security-sensitive SSRF protections.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..97e8bac9d1dd3b1cd1fc1d50677ff8af3740f953
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('browser_automation',), description='The issue reports failures in connecting to existing Chrome sessions via both an extension and a local profile, primarily impacting the browser automation workflow.', caveats=(\"Issue specifically addresses issues with Chrome session integration and the 'EasyClaw' extension for browser relay.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..45b2dc48c0ce42167b7a4699bacbf3c9966a9507
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260613T173942Z-4154262.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260613T173942Z-4154262.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260613T173942Z-4154262\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..cccf8e681447355f35e070565354a7d30100fd9b
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'sessions'), description='The issue reports a session key collision in the ACP runtime where multiple threads share one process due to incorrect hashing of parent channel IDs instead of individual thread identifiers.', caveats=('Issue involves ACP session keys being improperly derived from parent channel IDs, causing shared processes across threads.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..12bb82b9f608638bb0cd6ee437348d2c0ec6ed8a
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('queueing', 'codex'), description='The pull request fixes stalled messages in the per-lane queue (queueing) and restores OAuth for the openai-codex provider (codex).', caveats=('The PR addresses two distinct issues: lane queue stuck states (queueing) and OpenAI Codex authentication (codex).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..1e8236039be0acae8a7e14eb3c3cfcc66e0db099
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('notifications', 'reliability'), description=\"The issue describes a race condition (takeover error) occurring during the 'announce' delivery path of a cron job.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..1f2198d82e459c5d5961f25943b5bed5c33700d6
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'acpx', 'sessions'), description='The issue involves ACP/ACPX backend visibility and session lifecycle state (sessions_yield) issues during agent runs.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..436c8c832e61de44d6bc1efbde3afe0c744977bd
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'chat_integrations'), description='This pull request includes fixes for Codex startup logic/OAuth configuration and adds history features for the WhatsApp chat integration.', caveats=('The issue addresses multiple distinct areas including Codex lifecycle and WhatsApp integration, justifying both codex and chat_integrations labels.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..aae6004b5b667e1cb8fcd7f88bba102b18f728f7
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving', 'local_models'), description='The issue reports a failure in the OpenAI-compatible streaming protocol where final usage chunks are ignored, specifically when using llama.cpp.', caveats=('Included local_models because llama.cpp is explicitly mentioned as a primary signal for this issue.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..87becffb9fcfb6c4d3c778c443a4d1b84dfeda4c
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'acpx', 'approvals'), description='The issue proposes scoped permissionMode for acp and acpx sessions at both binding and agent levels.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..72fa27df7be49a08a7f58a1a00868599b16cb520
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'sessions'), description=\"The PR implements session identity passing and guarded result surfacing for the gateway's message sending path.\", caveats=('The issue involves both outbound session identity and gateway-specific behavior regarding guarded message delivery.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..3bab4a4878d890d7d5ef2823b5d923fea75bc5ed
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'reliability'), description='This pull request fixes a deadlock occurring during active-memory recall by isolating the subagent into its own lane, falling under both memory features and reliability improvements for deadlock resolution.', caveats=(\"The term 'lane' and 'subagent' are present but the primary issue is a deadlock in the memory feature.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..bb8e6641c0713b3ec2f55c37fd2982b497d3b241
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'api_surface', 'ui_tui'), description=\"The PR implements a new 'chat.message.get' RPC to resolve content truncation in the WebChat sidebar, affecting both the Gateway backend and UI frontend.\", caveats=('The inclusion of gateway is based on the specific rule regarding gateway-backed full-message readers in webchat interfaces.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..dc239d6538b85b6ea7ae74698c05de166d6ddda3
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description=\"This PR updates documentation specifically regarding the queue system's 'steer' behavior and its interaction with streaming.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..5ce8bfd9b4727f4a527490ebbd404724294461f7
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json
@@ -0,0 +1,859 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 1,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn only final JSON:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nOutput only JSON."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant routing corrections:\n- `layered browser diagnostics` is `browser_automation`, not `ui_tui`.\n- `cron force-run deadlock` is `cron_automation`. Do not add `reliability` just because the word deadlock appears when the named feature owner is the better maintainer bucket.\n- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.\n- Do not use `ui_tui` merely because a title says \u201cshowing\u201d, \u201cdisplay\u201d, \u201creader\u201d, or \u201cdiagnostics\u201d. Use `ui_tui` only when webchat, TUI, UI views, terminal UI, or user-facing chat interface behavior is explicitly central.\n- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem when that subsystem is an allowed topic. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n- For `cron force-run deadlock`, classify as `cron_automation` without `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:\n - `exec_tools`\n - `sandboxing`\n - `approvals`\n- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `totalTokensFresh` and stale per-session token counts are session-state/freshness concerns and should include `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.\n- Stale token counts are `telemetry_usage`.\n- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nHooks:\n- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.\n- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.\n- Security fixes in a skill scanner should include `skills_plugins`.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nACP, ACPX, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.\n- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, or `reliability` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nUse only allowed topic ids. Never invent a topic id, even if a title suggests a missing bucket such as packaging, deployment, CI, infrastructure, or supply chain. If no allowed topic is central, return an empty array.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nDecision process:\n1. Read the title first.\n2. Identify the user-visible subject of the issue or PR.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.\n5. Use 3 topics only when the title explicitly contains three central concerns.\n6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.\n\nCron/session/reliability correction:\n- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.\n- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.\n- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.\n- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.\n- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.\n\nSecurity/CI correction:\n- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.\n- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.\n- Do not add nonexistent or label-like topics such as `packaging_deployment`.\n- Do not add deployment, packaging, CI, or release topics unless they are in the allowed list; they are not allowed here.\n\nGeneral suppression rules:\n- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.\n- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.\n- Do not add `reliability` for every bug. Add it when the title\u2019s central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.\n- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.\n- Never output labels outside the allowed topic ids.\n\nOutput raw JSON only."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\nYou may receive a GitHub target URL, title, and sometimes a body or summary.\nThe title is the primary signal. Use the body only when the title is ambiguous.\nIgnore examples, tests, labels, file paths, implementation details, symptoms, broad impact, and incidental words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, label-spam keyword, or security-hardening mechanism.\n\nHigh-priority corrections:\n- Local Ollama embeddings failing when a proxy is enabled, including wording like \u201cSSRF defenses ignore NO_PROXY\u201d, should be classified as `local_models` and `self_hosted_inference`.\n- For that Ollama/proxy/NO_PROXY case, do NOT add `security`, `local_model_providers`, or `memory`. \u201cSSRF defenses\u201d is the mechanism or label-like context, not the central maintainer bucket, unless the title is actually about a vulnerability or security policy.\n- \u201cExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield\u201d should be classified as `acp` and `acpx`.\n- For that ACP/sessions_yield case, do NOT add `sessions`. `sessions_yield` is an ACP/ACPX workflow term or implementation detail, not necessarily a sessions maintainer concern.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model systems.\n- Use `self_hosted_inference` when the issue is about using self-hosted inference servers such as Ollama, llama.cpp, vLLM, TGI, or LocalAI as inference providers.\n- Ollama is a strong signal for both `local_models` and `self_hosted_inference` when the issue is about using Ollama locally or through local inference.\n- Do not use `local_model_providers` for Ollama/local runtime failures unless the title is specifically about provider catalog metadata, provider-specific open-weight integration, known context windows, or named provider/model-family support.\n- Do not add `memory` merely because embeddings are mentioned. Add `memory` only when memory behavior, active-memory recall, vector stores, memory providers, or embedding-provider memory behavior is the central subject.\n\nSecurity:\n- Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior when that is the main user-visible subject.\n- Do not add `security` merely because a bug mentions SSRF defenses, proxy bypass behavior, NO_PROXY, blocked requests, or a security-related implementation detail.\n- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.\n\nACP and ACPX:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings, explicit ACP agent runs, embedded backend visibility, final JSON state in ACP workflow, or ACPX session workflow concepts.\n- Do not add `sessions` merely because an ACP/ACPX title contains `sessions_yield`, \u201csession\u201d, or \u201cstate\u201d if the central issue is ACP/ACPX workflow behavior.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings as a memory subsystem, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n- Suppress `memory` when embeddings are only part of a local model provider failure and the user-visible subject is local/self-hosted inference.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `gateway` or `ui_tui` for browser diagnostics unless gateway or UI behavior is explicitly the subject.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- `cron force-run deadlock` is `cron_automation`; do not add `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because a title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.\n- Stale token counts are `telemetry_usage`.\n- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.\n\nOpen-weight and local model providers:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.\n- Security fixes in a skill scanner should include `skills_plugins`.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nModel serving:\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.\n- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nPolicy, config, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nDocumentation:\n- Documentation-only PRs should usually include `docs` plus the central documented area.\n- A `docs(queue): ...` title usually includes both `docs` and `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nTool calling:\n- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.\n- Mentions of \u201ctool boundaries\u201d in docs about another system are usually context, not `tool_calling`.\n\nHooks:\n- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.\n- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.\n- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, `reliability`, or `security` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, embeddings, SSRF, NO_PROXY, sessions_yield, or state. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nImportant corrections learned from prior failures:\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n - Example: \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`.\n - Do not stop at `notifications` when Slack, WhatsApp, chat history, chat app delivery, or chat target channels are central.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n - \u201cnetwork conformance checks\u201d should include `security`.\n - \u201cMCP conformance checks\u201d should include `mcp_tooling`.\n - Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n - Do not map \u201cmodel conformance checks\u201d to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.\n- Provider catalog metadata is `local_model_providers`.\n - If a missing provider catalog entry causes a silent failure instead of a clear error, also include `reliability`.\n - Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.\n- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nUse only allowed topic ids. Never invent topic ids. If no allowed topic is central, return an empty array.\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, packaging_deployment\n\nDecision process:\n1. Read the title first.\n2. Identify the user-visible subject of the issue or PR.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.\n5. Use 3 topics only when the title explicitly contains three central concerns.\n6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.\n\nGeneral rules:\n- Prefer the smallest complete topic set.\n- Do not add topics because a word appears incidentally.\n- Do not add broad umbrella topics when a narrower central topic already captures the issue.\n- Do not add implementation-detail topics from examples, stack traces, package names, file paths, labels, or test names.\n- If a topic is not allowed, do not output it.\n\nKnown mappings and corrections:\n- Desktop Chrome automation, browser profiles, browser attach, Google sign-in through Chrome, and use of existing Chrome browser sessions are `browser_automation`.\n- In browser automation titles, the word \u201csessions\u201d often means Chrome/user-profile browser sessions. Do not classify that as `sessions` unless OpenClaw chat/session ownership or lifecycle is central.\n- `Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` is `browser_automation` and `packaging_deployment`.\n- `spawn npx ENOENT`, missing executables, broken runtime packaging, missing installed command, desktop packaging failures, dependency launch failures, installation/runtime environment failures, and app deployment breakage are `packaging_deployment`.\n- Do not add `reliability` merely because something \u201cfails\u201d. Add `reliability` only for central exception/crash/hang/deadlock/stuck/race/timeout/robustness failures.\n- A title naming `spawn npx ENOENT` in a desktop/browser attach flow is primarily packaging/deployment, not reliability.\n\nMemory and self-hosted inference:\n- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` is `memory` and `self_hosted_inference`.\n- Use `memory` when memory, embeddings, recall, vector storage, or memory retrieval is the central feature.\n- Use `self_hosted_inference` when the title is about OpenAI-compatible providers for self-hosted servers or self-hosted inference endpoints.\n- Do not add `local_models` just because examples include llama.cpp, Ollama, vLLM, TGI, or LocalAI. Treat those names as examples unless local model management itself is the central user-visible subject.\n- Do not add `local_model_providers` or `model_serving` when the title\u2019s central concern is an OpenAI-compatible provider for self-hosted inference and another explicit feature like memory already accounts for the context.\n\nCron/session/reliability correction:\n- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.\n- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.\n- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.\n- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.\n- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.\n\nSecurity/CI correction:\n- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.\n- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.\n- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is allowed only for packaging/deployment/install/runtime environment concerns, not generic CI hardening.\n\nSuppression rules:\n- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.\n- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.\n- Do not add `sessions` for browser sessions, Chrome sessions, login sessions, profile attachment, or user profiles unless OpenClaw session ownership/lifecycle is the central subject.\n- Do not add `reliability` for every bug. Add it only when the title\u2019s central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.\n- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.\n\nOutput raw JSON only."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => `cron_automation`, `sessions`, `reliability`.\n- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the title\u2019s real subject is that area\u2019s contract or initialization behavior.\n- Example: `sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`, not `reliability`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n- If an \u201cannounce delivery\u201d title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => no `notifications`.\n\nCron automation:\n- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.\n- Do not suppress `cron_automation` merely because the title contains \u201cannounce\u201d or \u201cdelivery\u201d.\n- Example: \u201ccron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting\u201d => include `cron_automation`.\n\nSessions:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, or user actively chatting in a session is central.\n- Example: `EmbeddedAttemptSessionTakeoverError when user is actively chatting` => include `sessions`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n- Example: \u201cfix(bundle-mcp): coerce stringified object/array params before MCP tool calls\u201d => `mcp_tooling`, `tool_calling`.\n\nACP, ACPX, and approvals:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.\n- Example: \u201c[Feature]: Per-binding and per-agent permissionMode for ACP sessions\u201d => `acp`, `approvals`, `acpx`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n- Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`.\n- In that example, do not add `reliability`; \u201csilently breaks\u201d is secondary wording and the maintainer buckets are sandboxing plus agent/subagent runtime.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n- Example: \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n- \u201cnetwork conformance checks\u201d should include `security`.\n- \u201cMCP conformance checks\u201d should include `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n- Do not map \u201cmodel conformance checks\u201d to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n- Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, labels, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nImportant routing rules:\n- Use `acp` for ACP protocol/session behavior, ACP configured bindings, ACP session keys, ACP channel mapping, or ACP process/session reuse.\n- Use `sessions` when session keys, persistent sessions, shared sessions, thread/session identity, session takeover, or session lifecycle are central.\n- Do NOT add `acpx` merely because the item involves ACP bindings. Treat `acpx` as label spam unless the title explicitly centers ACPX extension/adapter behavior, per-binding/per-agent ACPX behavior, or names ACPX as the contract being changed.\n- Example: \u201cACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process\u201d => `acp`, `sessions`, not `acpx`.\n\nModel serving:\n- Use `model_serving` for OpenAI-compatible serving, streaming responses, model-server compatibility, final chunks, llama.cpp server behavior, endpoint streaming contracts, routing, or serving protocol behavior.\n- A title like \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is `model_serving`.\n- Do not output zero topics for serving/streaming issues just because \u201cusage\u201d appears.\n- Suppress `telemetry_usage` when \u201cusage\u201d is only the affected value caused by a serving stream/chunk bug. Add `telemetry_usage` only when usage tracking, usage accounting, reporting, or metrics collection is itself the central maintainer-owned subject.\n\nSecurity, hooks, and skills/plugins:\n- Use `security` for vulnerabilities, SSRF, token verification, credential boundaries, access restrictions, network policy, HIGH/CRITICAL vuln fixes, or allowed/blocked security behavior.\n- Use `hooks` when hook priority, hook execution, hook ordering, hook policy, or hook behavior is central.\n- Use `skills_plugins` when skills, plugins, skill scanners, plugin scanners, skill/plugin installation, skill/plugin metadata, or skill/plugin runtime behavior are central.\n- If a security title explicitly names multiple vulnerable areas, include each central area named.\n- Example: \u201csecurity: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification\u201d => `security`, `hooks`, `skills_plugins`.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.\n- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the title\u2019s real subject is that area\u2019s contract or initialization behavior.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n- If an \u201cannounce delivery\u201d title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.\n\nCron automation:\n- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.\n- Do not suppress `cron_automation` merely because the title contains \u201cannounce\u201d or \u201cdelivery\u201d.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, skill, scanner, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, binding, ACPX, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant correction for ACP / ACPX / sessions:\n- Titles about \u201cExplicit ACP agent runs\u201d are centrally about `acpx`.\n- When the same title also names ACP behavior, embedded ACP backend visibility, ACP agent backend state, ACP run results, or ACP final JSON state, include `acp`.\n- Do not add `sessions` just because the title mentions `sessions_yield`, yielding, stale state after a yield, or a session-related event boundary.\n- Treat `sessions_yield` as an implementation/event-boundary clue unless the actual subject is session lifecycle, persistence, resume, listing, storage, or session management.\n- Example: \u201cExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield\u201d => `acpx`, `acp`; not `sessions`.\n- \u201cfinal JSON state\u201d in an ACP/ACPX title is usually part of the ACP/ACPX result contract, not a general sessions issue.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.\n- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n- Example: \u201cgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error\u201d => `local_model_providers`, `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, session, sessions_yield, yield, or final JSON. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nOutput JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, packaging_deployment\n\nClassification rules:\n- Read the title first. Use body or summary only if the title is ambiguous.\n- Choose one primary topic.\n- Add secondary topics only when they are explicit central maintainer-owned subjects.\n- Use the minimum complete topic set. Do not add topics for labels, examples, paths, tests, implementation details, or broad ownership words.\n- Never invent topic ids.\n\nImportant edge cases:\n- If a provider catalog/model catalog issue says a missing model causes \u201csilent failure\u201d, \u201cfails silently\u201d, \u201chang\u201d, \u201ccrash\u201d, \u201cno error\u201d, or similar robustness behavior, include `reliability` in addition to the provider/catalog topic.\n Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\n- Use `acpx` when ACPX is explicitly named as the subject. Do not normalize ACPX to `acp`.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n Do not include `acp` there unless ACP itself is separately central.\n\n- Use `skills_plugins` when user-installed plugins, plugin inheritance, Superpowers, plugin discovery, plugin installation, or plugin availability is central.\n\n- Use `codex` when Codex is named as a central product/runtime subject, not merely as bracket metadata.\n\n- Use `browser_automation` for Chrome/Chromium session reuse, browser profile attach, browser diagnostics, browser runtime behavior, Google sign-in through browser automation, or browser tooling issues.\n\n- Use `packaging_deployment` for install/runtime environment problems such as missing executables, PATH/package manager availability, `npx`/Node spawn failures, desktop app packaging, bundled runtime problems, or deployment/startup environment failures.\n Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nTopic reminders:\n- `local_model_providers`: provider-specific model catalogs, known model metadata, provider-backed model support.\n- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, request/response streaming, usage chunks, endpoint compatibility, model request routing.\n- `local_models`: LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local model apps/providers.\n- `reliability`: deadlock, hang, crash, race, liveness, timeout, stuck state, silent failure, robustness failure, or missing error behavior.\n- `acp`: ACP is centrally named.\n- `acpx`: ACPX is explicitly named or clearly ACPX-specific.\n- `approvals`: approval mode, permission mode, user approval behavior, permission policy.\n- `sessions`: session identity, lifecycle, routing, state, persistent process identity.\n- `gateway`: gateway routing, guarded gateway behavior, gateway-owned behavior, gateway send denial.\n- `exec_tools`: exec command/tool behavior, PATH fallback, exec contract behavior.\n- `sandboxing`: sandbox policy or sandbox runtime behavior.\n- `security`: vulnerabilities, credentials, network policy, access restrictions, outbound rules.\n- `notifications`: notification delivery, announce messages, heartbeat pushes.\n- `chat_integrations`: Slack, WhatsApp, chat app delivery, chat history, target channels.\n- `cron_automation`: cron scheduling, force-run, lifecycle, execution, or cron deadlock.\n- `api_surface`: exposed API/contract, reader contract, request/response shape, public integration behavior.\n- `ui_tui`: webchat, TUI, terminal UI, user-facing chat interface.\n- `docs`: documentation-only changes, tutorials, README, guides, examples, showcase additions.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because a related word appeared. Keep it only if it is actually a central user-visible issue, feature, policy, or contract."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nProcess:\n1. Read the title first. The title is the primary signal.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Always produce the required structured output.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, broad ownership area, or incidental UI wording.\n\nImportant anti-label-spam rules:\n- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.\n- Use `ui_tui` only when the terminal UI, UI rendering behavior, TUI layout, or interactive UI surface is itself the central subject.\n- \u201cshowing stale token counts\u201d is not automatically `ui_tui`; classify the underlying token/session usage contract instead.\n- \u201cdiagnostics\u201d is not automatically `ui_tui`; classify the subsystem whose diagnostics are central.\n\nSpecific mapping rules learned from examples:\n- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness should usually include `sessions` and `telemetry_usage`.\n Example: \u201cfix: respect totalTokensFresh flag to avoid showing stale token counts\u201d => `sessions`, `telemetry_usage`, not `ui_tui`.\n\n- Exec PATH fallback or command execution PATH resolution is `exec_tools`.\n Example facet: \u201cresolve exec PATH fallback\u201d => `exec_tools`.\n\n- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior is `browser_automation`, not `ui_tui`.\n Example facet: \u201clayered browser diagnostics\u201d => `browser_automation`.\n\n- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution are `cron_automation`.\n Example facet: \u201ccron force-run deadlock\u201d => `cron_automation`.\n Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.\n\n- ACP protocol/session behavior is `acp`.\n- Gateway-owned behavior is `gateway`.\n- Node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, or runtime setup is `agent_runtime`.\n Example: \u201cACP: add gateway-owned node-backed runtime\u201d => `acp`, `gateway`, `agent_runtime`, not `acpx`.\n\n- Do not use `acpx` just because ACP is present. Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the title\u2019s real subject is that area\u2019s contract, initialization, or execution behavior.\n- If the title names three concrete subsystem facets, prefer those three concrete buckets over replacing one with `reliability`.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.\n\nSessions:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.\n\nTelemetry and usage:\n- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.\n- If token counts are tied to a session state flag such as `totalTokensFresh`, include both `sessions` and `telemetry_usage`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, and approvals:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.\n\nCoding agents and agent runtime:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, node-backed runtime behavior, or internal runtime setup.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.\n- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, or execution is itself central.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Policy or conformance checks are `config` when they define or enforce governed behavior.\n- Network conformance checks should include `security`.\n- MCP conformance checks should include `mcp_tooling`.\n\nLocal model providers:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.\n- Missing models in a provider catalog are `local_model_providers`.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, or contract. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output requirement:\n- If a structured output tool exists, call it with exactly:\n {\"topics_of_interest\":[...]}\n- If no structured output tool exists, print raw JSON only.\n- Do not include explanations, Markdown, comments, or extra fields."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, coding_agents, api_surface,\nui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,\nagent_runtime, model_serving, local_models, self_hosted_inference,\ntelemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,\nconfig, security, mcp_tooling, tool_calling, hooks\n\nProcess:\n1. Read the title first. The title is the primary signal.\n2. Identify the central user-visible bug, feature, policy, contract, or documentation change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output raw JSON only. No Markdown, explanations, comments, or extra fields.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, transport mode, broad ownership area, or incidental wording.\n\nIgnore label prefixes:\n- Ignore title prefixes like `[codex]`, `fix:`, `Policy:`, `[Bug]:`, etc. Do not classify as `codex` merely because `[codex]` appears.\n\nModel and serving rules:\n- Use `model_serving` when the serving/runtime response path, model output handling, response streaming, thinking block handling, Responses API serving behavior, or served model behavior is central.\n- Use `local_models` when local models or local model behavior are central.\n- Use `local_model_providers` only when provider catalog metadata, provider-specific integration, missing provider models, provider configuration, or known provider/model-family metadata is central.\n- Do NOT use `local_model_providers` merely because a provider name appears as context.\n- \u201cLM Studio thinking blocks invisible with Responses API\u201d => `model_serving`, `local_models`, not `local_model_providers`.\n- Do not use `api_surface` merely because \u201cResponses API\u201d appears as the transport or compatibility mode. Use `api_surface` only when the API contract, endpoint behavior, reader API, schema, or externally consumed API surface is central.\n\nPolicy, config, and security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- \u201cPolicy: add model, network, and MCP conformance checks\u201d => `config`, `security`, `mcp_tooling`.\n- Do NOT use `local_model_providers` merely because a policy title says \u201cmodel\u201d.\n\nGateway, API, and UI:\n- Use `gateway` when gateway-owned behavior, gateway routing, webchat gateway behavior, or gateway-backed readers are central.\n- Use `api_surface` when an externally consumed API, reader contract, endpoint, request/response contract, or API-visible behavior is central.\n- Use `ui_tui` when a terminal UI, webchat UI, UI rendering behavior, full-message reader UI, TUI layout, interactive UI surface, or message reader surface is itself central.\n- \u201cwebchat full-message reader for truncated history\u201d => `gateway`, `api_surface`, `ui_tui`.\n- Do NOT suppress `ui_tui` when the title\u2019s central subject is a reader/viewer/message-reading UI surface.\n- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes are both `notifications` and `chat_integrations` when the title is about pushed Slack messages.\n- Do NOT classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, force-run behavior, deadlock, or execution is itself central.\n- \u201cPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes\u201d => `notifications`, `chat_integrations`, not `cron_automation`.\n\nSessions and telemetry:\n- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.\n- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.\n- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness usually include both `sessions` and `telemetry_usage`.\n- \u201crespect totalTokensFresh flag to avoid showing stale token counts\u201d => `sessions`, `telemetry_usage`, not `ui_tui`.\n\nExecution, browser, and cron:\n- Exec PATH fallback or command execution PATH resolution => `exec_tools`.\n- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior => `browser_automation`, not `ui_tui`.\n- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution => `cron_automation`.\n- Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.\n- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the title\u2019s real subject is that area\u2019s contract, initialization, rendering, or execution behavior.\n- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.\n- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.\n- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, approvals, and runtime:\n- Use `acp` for ACP protocol/session behavior.\n- Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.\n- Do not use `acpx` just because ACP is present.\n- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.\n- ACP session permissionMode changes with per-binding or per-agent scope => `acp`, `approvals`, `acpx`.\n- Use `agent_runtime` for node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, runtime setup, agent initialization, or spawning.\n- \u201cACP: add gateway-owned node-backed runtime\u201d => `acp`, `gateway`, `agent_runtime`, not `acpx`.\n\nCoding agents and sandboxing:\n- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.\n- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, provider, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, API, Responses API, LM Studio, or contract. Keep the topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.\n\nReturn exactly:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, hooks, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only, or call the structured JSON output tool if one exists.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nImportant correction:\nFor `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as:\n`reliability`, `auth_identity`, and `sessions`.\n\nWhy:\n- `self-heal lane wedges` is a reliability/liveness/robustness issue.\n- Do NOT classify `lane wedges` as `queueing`; \u201clane\u201d is an implementation detail unless queueing, queued execution, queue lifecycle, or scheduling behavior is explicitly the user-visible subject.\n- `OAuth` is an auth and identity concern, so use `auth_identity`.\n- `openai-codex OAuth` does NOT imply `codex`; here \u201ccodex\u201d is part of the OAuth/provider naming, not the central maintainer-owned subject.\n- `embedded path` makes the OAuth restoration session/path-specific, so include `sessions`.\n\nAuth and identity:\n- Use `auth_identity` for OAuth, login identity, account identity, authentication flows, token identity, credential identity, or restoring auth behavior.\n- OAuth fixes should usually include `auth_identity`.\n- Add `sessions` when the auth problem is tied to a session path, embedded path, outbound session identity, session lifecycle, session routing, or session-specific behavior.\n- Do not add `codex` merely because an OAuth provider or path name contains \u201ccodex\u201d.\n\nReliability:\n- Use `reliability` when the central bug is a general deadlock, hang, crash, wedge, stuck state, race, timeout, liveness issue, self-healing behavior, or robustness failure.\n- Words like `self-heal`, `wedge`, `wedged`, `stuck`, and `recover` are strong reliability signals when the title is about restoring forward progress.\n- Suppress `reliability` only when the failure mode clearly belongs to a more specific named feature owner, such as `cron force-run deadlock` being `cron_automation`.\n\nQueueing:\n- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.\n- Do not add `queueing` for incidental words like lane, main lane, worker, subagent, execution path, or internal routing path.\n- A lane wedge is not automatically queueing.\n\nCodex:\n- Use `codex` when Codex itself is the central user-visible subject, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title contains \u201ccodex\u201d inside a provider name, OAuth label, bracketed metadata, or path name.\n- If the real subject is OAuth, classify as `auth_identity`, not `codex`.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like lane, queue, codex, OAuth, embedded, path, wedge, session, usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, deadlock, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 54,
+ 92,
+ 130,
+ 168,
+ 206,
+ 244,
+ 282,
+ 320,
+ 358,
+ 396,
+ 438,
+ 476,
+ 514
+ ],
+ "num_full_val_evals": 14,
+ "objective_pareto_front": {
+ "weighted_score": 0.6971520146520145
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ],
+ [
+ 0
+ ],
+ [
+ 2
+ ],
+ [
+ 2
+ ],
+ [
+ 2
+ ],
+ [
+ 3
+ ],
+ [
+ 5
+ ],
+ [
+ 7
+ ],
+ [
+ 5
+ ],
+ [
+ 1
+ ],
+ [
+ 7
+ ],
+ [
+ 11
+ ],
+ [
+ 2
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 1
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 11,
+ 12
+ ],
+ "1": [
+ 12
+ ],
+ "2": [
+ 1,
+ 2,
+ 4,
+ 10,
+ 11,
+ 12
+ ],
+ "3": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 7,
+ 9,
+ 10
+ ],
+ "4": [
+ 8,
+ 2,
+ 4
+ ],
+ "5": [
+ 0,
+ 2,
+ 4,
+ 5,
+ 7,
+ 11,
+ 12
+ ],
+ "6": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 6,
+ 7,
+ 10,
+ 11,
+ 12,
+ 13
+ ],
+ "7": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 8,
+ 9,
+ 10,
+ 13
+ ],
+ "8": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 10,
+ 12
+ ],
+ "9": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 7,
+ 9,
+ 11
+ ],
+ "10": [
+ 0,
+ 2
+ ],
+ "11": [
+ 0,
+ 1,
+ 4,
+ 9,
+ 10,
+ 13
+ ],
+ "12": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 6,
+ 8,
+ 11,
+ 12,
+ 13
+ ],
+ "13": [
+ 0,
+ 1,
+ 5,
+ 7,
+ 9,
+ 10,
+ 11,
+ 12
+ ],
+ "14": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13
+ ],
+ "15": [
+ 0,
+ 2,
+ 4,
+ 5,
+ 7
+ ],
+ "16": [
+ 0,
+ 1,
+ 2,
+ 4,
+ 5,
+ 7,
+ 9,
+ 11,
+ 12
+ ],
+ "17": [
+ 0,
+ 1,
+ 12
+ ],
+ "18": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 5,
+ 6,
+ 8,
+ 9,
+ 11,
+ 13
+ ],
+ "19": [
+ 7
+ ],
+ "20": [
+ 11,
+ 12
+ ],
+ "21": [
+ 1,
+ 3,
+ 4,
+ 7,
+ 8,
+ 11,
+ 12
+ ],
+ "22": [
+ 8
+ ],
+ "23": [
+ 4
+ ],
+ "24": [
+ 10,
+ 6
+ ],
+ "25": [
+ 1,
+ 10,
+ 11,
+ 12
+ ],
+ "26": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 5,
+ 6,
+ 8,
+ 10,
+ 12,
+ 13
+ ],
+ "27": [
+ 13
+ ],
+ "28": [
+ 3,
+ 6
+ ],
+ "29": [
+ 9,
+ 4
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z",
+ "seed": 0,
+ "total_metric_calls": 544,
+ "val_aggregate_scores": [
+ 0.6395802005012532,
+ 0.6971520146520146,
+ 0.6569444444444444,
+ 0.46397435897435896,
+ 0.6616269841269842,
+ 0.5029318488529015,
+ 0.4785409035409035,
+ 0.5265246449456976,
+ 0.4836904761904762,
+ 0.4776587301587301,
+ 0.56900327742433,
+ 0.5957763532763533,
+ 0.6629365079365079,
+ 0.49142857142857144
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.639580200501253
+ },
+ {
+ "weighted_score": 0.6971520146520145
+ },
+ {
+ "weighted_score": 0.6569444444444443
+ },
+ {
+ "weighted_score": 0.46397435897435885
+ },
+ {
+ "weighted_score": 0.6616269841269842
+ },
+ {
+ "weighted_score": 0.5029318488529014
+ },
+ {
+ "weighted_score": 0.47854090354090345
+ },
+ {
+ "weighted_score": 0.5265246449456975
+ },
+ {
+ "weighted_score": 0.48369047619047617
+ },
+ {
+ "weighted_score": 0.47765873015873
+ },
+ {
+ "weighted_score": 0.56900327742433
+ },
+ {
+ "weighted_score": 0.5957763532763531
+ },
+ {
+ "weighted_score": 0.6629365079365079
+ },
+ {
+ "weighted_score": 0.4914285714285713
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 0.2857142857142857,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.0,
+ "23": 0.10526315789473684,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.2,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.5,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.15384615384615385,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.2,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 0.2857142857142857,
+ "18": 1.0,
+ "19": 0.3333333333333333,
+ "20": 0.25,
+ "21": 0.5,
+ "22": 0.2857142857142857,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.25,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.15384615384615385,
+ "3": 0.2857142857142857,
+ "4": 0.15384615384615385,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 0.0,
+ "10": 0.2,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.14285714285714285,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.14285714285714285,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.15384615384615385,
+ "24": 0.15384615384615385,
+ "25": 0.25,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 1.0,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 0.25,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 0.2857142857142857,
+ "18": 0.2857142857142857,
+ "19": 0.3333333333333333,
+ "20": 0.25,
+ "21": 1.0,
+ "22": 0.2857142857142857,
+ "23": 1.0,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 0.25,
+ "27": 0.125,
+ "28": 0.2,
+ "29": 1.0
+ },
+ {
+ "0": 0.5,
+ "1": 0.2857142857142857,
+ "2": 0.15384615384615385,
+ "3": 1.0,
+ "4": 0.15384615384615385,
+ "5": 1.0,
+ "6": 0.25,
+ "7": 1.0,
+ "8": 0.0,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 0.2,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.10526315789473684,
+ "24": 0.5,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 0.125,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 0.5,
+ "1": 0.2,
+ "2": 0.25,
+ "3": 0.0,
+ "4": 0.15384615384615385,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.2,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 0.5,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.2,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.3333333333333333,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.0,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 1.0,
+ "29": 0.25
+ },
+ {
+ "0": 0.5,
+ "1": 0.0,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 0.25,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 0.25,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 0.2,
+ "18": 0.2857142857142857,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.25,
+ "23": 0.10526315789473684,
+ "24": 0.5,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.1,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.25,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.25,
+ "6": 0.2857142857142857,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 0.25,
+ "12": 1.0,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 1.0,
+ "23": 0.25,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 1.0,
+ "27": 0.125,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.25,
+ "1": 0.0,
+ "2": 0.14285714285714285,
+ "3": 1.0,
+ "4": 0.25,
+ "5": 0.5,
+ "6": 0.25,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 0.2,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 0.25,
+ "27": 0.125,
+ "28": 0.2,
+ "29": 1.0
+ },
+ {
+ "0": 0.25,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.2,
+ "17": 0.5,
+ "18": 0.2857142857142857,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.5,
+ "23": 0.15384615384615385,
+ "24": 1.0,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.10526315789473684,
+ "28": 0.5,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 0.2857142857142857,
+ "4": 0.1111111111111111,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.2857142857142857,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 0.2,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 1.0,
+ "21": 1.0,
+ "22": 0.3333333333333333,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 0.2857142857142857,
+ "27": 0.125,
+ "28": 0.25,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 0.25,
+ "4": 0.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 0.2,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 0.2857142857142857,
+ "19": 0.5,
+ "20": 1.0,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.25,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.16666666666666666,
+ "28": 0.25,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.25,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.0,
+ "17": 0.25,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.14285714285714285,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 1.0,
+ "28": 0.5,
+ "29": 0.25
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa_state.bin b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..005548f7f620415ba83338d8367457949c2628ad
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2fa7fd2fb7e68c74c5d0a81d4ef7eb197f372b119eb3bf7b773d3e98158fcd0b
+size 125702
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.pid b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..e08d584a6ad395ef272accb59b5d6289d70a2782
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.pid
@@ -0,0 +1 @@
+4147849
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.stderr.log b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.stdout.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..479345c2509ea83ccc9daa6cfd6390d712ad77d8
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/optimize.stdout.json
@@ -0,0 +1,1654 @@
+Iteration 0: Base program full valset score: 0.6395802005012532 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.6395802005012532
+Iteration 1: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Use the title as the primary signal. Use the body or summary only when the title is ambiguous.
+
+Pick the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Important domain rules:
+
+- Use `coding_agents` when the central subject is coding agents, subagents, delegated agents, agent spawning, worker agents, or coding-agent initialization behavior.
+- Use `agent_runtime` when the central subject is agent runtime behavior, runtime ownership, runtime startup, runtime initialization, node-backed runtimes, or execution-runtime lifecycle.
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox execution, or sandbox-related breakage is central.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` should be classified as `coding_agents`, `sandboxing`, and `agent_runtime`.
+
+Suppression rules:
+
+- Do not classify `sessions` merely because a term contains the word `sessions`, such as `sessions_spawn`.
+- Treat `sessions_spawn` as a subagent/coding-agent spawning mechanism unless the title is explicitly about user-visible session identity, session lifecycle, session routing, or session state.
+- Do not add `sessions` for subagent initialization bugs unless session behavior itself is the central user-visible issue.
+- Do not add a topic just because a related word appears in an internal function name, path, label, implementation detail, or compound identifier.
+
+General process:
+
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+Iteration 1: New subsample score 1.65 is not better than old score 3.2, skipping
+Iteration 2: Selected program 0 score: 0.6395802005012532
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Core task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first. It is the primary signal.
+2. Identify the user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Suppress topics triggered only by labels, component names, paths, examples, internal implementation details, or broad ownership words.
+6. Output JSON only.
+
+Important domain rules:
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, stuck state, timeout, wedge, lane wedge, self-healing failure, or robustness/liveness issue.
+- A title like `self-heal lane wedges` is `reliability`.
+- Do not add `queueing` merely because the title mentions lane, main lane, worker, subagent, queue internals, or execution paths.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be central and user-visible.
+
+Auth identity:
+- Use `auth_identity` when OAuth, login identity, account identity, credentials identity, authentication state, authorization identity, or identity restoration is central.
+- A title like `restore openai-codex OAuth on embedded path` includes `auth_identity`.
+- OAuth is not merely a security topic unless the issue is about vulnerabilities, access restrictions, network boundaries, or credential exposure policy.
+
+Sessions:
+- Use `sessions` when session identity, embedded session paths, session lifecycle, session routing, session state, or session-specific behavior is central.
+- If an OAuth/login issue is specifically about an embedded path or restoring identity in an embedded/session flow, include `sessions`.
+- `restore openai-codex OAuth on embedded path` should include `sessions`.
+
+Codex:
+- Use `codex` only when Codex behavior itself is the central subject: Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because a title contains `openai-codex`, `[codex]`, or a Codex-related auth/provider label.
+- In `restore openai-codex OAuth on embedded path`, `openai-codex` is an OAuth/identity label, not enough by itself to classify as `codex`.
+
+Composite titles:
+- If a title lists independent central fixes joined by `+`, `and`, commas, or semicolons, classify each central user-visible item.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`
+ should be:
+ {"topics_of_interest":["reliability","auth_identity","sessions"]}
+- Do not add `queueing` for `lane wedges`.
+- Do not add `codex` for `openai-codex OAuth` unless the title clearly says the Codex product/runtime/setup behavior itself is the subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like lane, queue, Codex, path, OAuth, security, status, send, gateway, model, usage, tool, cron, plugin, or contract. Keep a topic only if it is actually a central maintainer-owned subject.
+
+Output only JSON.
+Iteration 2: New subsample score 2.25 is not better than old score 3.142857142857143, skipping
+Iteration 3: Selected program 0 score: 0.6395802005012532
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 3: New subsample score 4.0 is better than old score 1.0714285714285714. Continue to full eval and add to candidate pool.
+Iteration 3: Found a better program on the valset with score 0.6971520146520146.
+Iteration 3: Valset score for new program: 0.6971520146520146 (coverage 30 / 30)
+Iteration 3: Val aggregate for new program: 0.6971520146520146
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.2, 29: 0.25}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.6971520146520145}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.2, 29: 0.2857142857142857}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 3: Valset pareto front aggregate score: 0.7566758241758241
+Iteration 3: Updated valset pareto front programs: {0: {0, 1}, 1: {0, 1}, 2: {1}, 3: {0, 1}, 4: {0, 1}, 5: {0}, 6: {0, 1}, 7: {0, 1}, 8: {0, 1}, 9: {0, 1}, 10: {0}, 11: {0, 1}, 12: {0, 1}, 13: {0, 1}, 14: {1}, 15: {0}, 16: {0, 1}, 17: {0, 1}, 18: {0, 1}, 19: {0, 1}, 20: {0, 1}, 21: {1}, 22: {1}, 23: {1}, 24: {0, 1}, 25: {1}, 26: {0, 1}, 27: {0, 1}, 28: {0, 1}, 29: {0}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 3: Best valset aggregate score so far: 0.6971520146520146
+Iteration 3: Best program as per aggregate score on valset: 1
+Iteration 3: Best score on valset: 0.6971520146520146
+Iteration 3: Linear pareto front program index: 1
+Iteration 3: New program candidate index: 1
+Iteration 4: Selected program 0 score: 0.6395802005012532
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing corrections:
+- `layered browser diagnostics` is `browser_automation`, not `ui_tui`.
+- `cron force-run deadlock` is `cron_automation`. Do not add `reliability` just because the word deadlock appears when the named feature owner is the better maintainer bucket.
+- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.
+- Do not use `ui_tui` merely because a title says āshowingā, ādisplayā, āreaderā, or ādiagnosticsā. Use `ui_tui` only when webchat, TUI, UI views, terminal UI, or user-facing chat interface behavior is explicitly central.
+- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem when that subsystem is an allowed topic. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+- For `cron force-run deadlock`, classify as `cron_automation` without `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `totalTokensFresh` and stale per-session token counts are session-state/freshness concerns and should include `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.
+- Stale token counts are `telemetry_usage`.
+- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.
+- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.
+- Security fixes in a skill scanner should include `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.
+- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, or `reliability` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 3.25 is better than old score 1.342857142857143. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.6569444444444444 (coverage 30 / 30)
+Iteration 4: Val aggregate for new program: 0.6569444444444444
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2857142857142857, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 0.5, 22: 0.2857142857142857, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 1.0, 27: 0.125, 28: 0.25, 29: 0.2857142857142857}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.6569444444444443}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.25, 29: 0.2857142857142857}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 4: Valset pareto front aggregate score: 0.7882631257631257
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2}, 2: {1, 2}, 3: {0, 1, 2}, 4: {2}, 5: {0, 2}, 6: {0, 1, 2}, 7: {0, 1, 2}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 2}, 11: {0, 1}, 12: {0, 1, 2}, 13: {0, 1}, 14: {1, 2}, 15: {0, 2}, 16: {0, 1, 2}, 17: {0, 1}, 18: {0, 1, 2}, 19: {2}, 20: {2}, 21: {1}, 22: {1}, 23: {1}, 24: {0, 1, 2}, 25: {1}, 26: {0, 1, 2}, 27: {0, 1, 2}, 28: {2}, 29: {0, 2}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 4: Best valset aggregate score so far: 0.6971520146520146
+Iteration 4: Best program as per aggregate score on valset: 1
+Iteration 4: Best score on valset: 0.6971520146520146
+Iteration 4: Linear pareto front program index: 1
+Iteration 4: New program candidate index: 2
+Iteration 5: Selected program 2 score: 0.6569444444444444
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Use only allowed topic ids. Never invent a topic id, even if a title suggests a missing bucket such as packaging, deployment, CI, infrastructure, or supply chain. If no allowed topic is central, return an empty array.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Decision process:
+1. Read the title first.
+2. Identify the user-visible subject of the issue or PR.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.
+5. Use 3 topics only when the title explicitly contains three central concerns.
+6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.
+
+Cron/session/reliability correction:
+- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.
+- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.
+- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.
+- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.
+- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.
+
+Security/CI correction:
+- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.
+- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.
+- Do not add nonexistent or label-like topics such as `packaging_deployment`.
+- Do not add deployment, packaging, CI, or release topics unless they are in the allowed list; they are not allowed here.
+
+General suppression rules:
+- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.
+- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.
+- Do not add `reliability` for every bug. Add it when the titleās central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.
+- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.
+- Never output labels outside the allowed topic ids.
+
+Output raw JSON only.
+Iteration 5: New subsample score 4.0 is better than old score 2.619047619047619. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.46397435897435896 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.46397435897435896
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.15384615384615385, 3: 0.2857142857142857, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.0, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.14285714285714285, 17: 0.5, 18: 1.0, 19: 0.14285714285714285, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.15384615384615385, 25: 0.25, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.46397435897435885}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.2857142857142857}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 5: Valset pareto front aggregate score: 0.8138583638583639
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2, 3}, 2: {1, 2}, 3: {0, 1, 2}, 4: {2}, 5: {0, 2}, 6: {0, 1, 2, 3}, 7: {0, 1, 2, 3}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 2}, 11: {0, 1}, 12: {0, 1, 2, 3}, 13: {0, 1}, 14: {1, 2, 3}, 15: {0, 2}, 16: {0, 1, 2}, 17: {0, 1}, 18: {0, 1, 2, 3}, 19: {2}, 20: {2}, 21: {1, 3}, 22: {1, 3}, 23: {1, 3}, 24: {0, 1, 2}, 25: {1}, 26: {0, 1, 2, 3}, 27: {3}, 28: {3}, 29: {0, 2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 5: Best valset aggregate score so far: 0.6971520146520146
+Iteration 5: Best program as per aggregate score on valset: 1
+Iteration 5: Best score on valset: 0.6971520146520146
+Iteration 5: Linear pareto front program index: 1
+Iteration 5: New program candidate index: 3
+Iteration 6: Selected program 2 score: 0.6569444444444444
+Iteration 6: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+You may receive a GitHub target URL, title, and sometimes a body or summary.
+The title is the primary signal. Use the body only when the title is ambiguous.
+Ignore examples, tests, labels, file paths, implementation details, symptoms, broad impact, and incidental words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, label-spam keyword, or security-hardening mechanism.
+
+High-priority corrections:
+- Local Ollama embeddings failing when a proxy is enabled, including wording like āSSRF defenses ignore NO_PROXYā, should be classified as `local_models` and `self_hosted_inference`.
+- For that Ollama/proxy/NO_PROXY case, do NOT add `security`, `local_model_providers`, or `memory`. āSSRF defensesā is the mechanism or label-like context, not the central maintainer bucket, unless the title is actually about a vulnerability or security policy.
+- āExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yieldā should be classified as `acp` and `acpx`.
+- For that ACP/sessions_yield case, do NOT add `sessions`. `sessions_yield` is an ACP/ACPX workflow term or implementation detail, not necessarily a sessions maintainer concern.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model systems.
+- Use `self_hosted_inference` when the issue is about using self-hosted inference servers such as Ollama, llama.cpp, vLLM, TGI, or LocalAI as inference providers.
+- Ollama is a strong signal for both `local_models` and `self_hosted_inference` when the issue is about using Ollama locally or through local inference.
+- Do not use `local_model_providers` for Ollama/local runtime failures unless the title is specifically about provider catalog metadata, provider-specific open-weight integration, known context windows, or named provider/model-family support.
+- Do not add `memory` merely because embeddings are mentioned. Add `memory` only when memory behavior, active-memory recall, vector stores, memory providers, or embedding-provider memory behavior is the central subject.
+
+Security:
+- Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior when that is the main user-visible subject.
+- Do not add `security` merely because a bug mentions SSRF defenses, proxy bypass behavior, NO_PROXY, blocked requests, or a security-related implementation detail.
+- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.
+
+ACP and ACPX:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings, explicit ACP agent runs, embedded backend visibility, final JSON state in ACP workflow, or ACPX session workflow concepts.
+- Do not add `sessions` merely because an ACP/ACPX title contains `sessions_yield`, āsessionā, or āstateā if the central issue is ACP/ACPX workflow behavior.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings as a memory subsystem, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Suppress `memory` when embeddings are only part of a local model provider failure and the user-visible subject is local/self-hosted inference.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `gateway` or `ui_tui` for browser diagnostics unless gateway or UI behavior is explicitly the subject.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- `cron force-run deadlock` is `cron_automation`; do not add `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because a title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.
+- Stale token counts are `telemetry_usage`.
+- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.
+- Security fixes in a skill scanner should include `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Policy, config, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.
+- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.
+- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, `reliability`, or `security` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, embeddings, SSRF, NO_PROXY, sessions_yield, or state. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 6: New subsample score 4.0 is better than old score 2.3909774436090228. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.6616269841269842 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.6616269841269842
+Iteration 6: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2857142857142857, 18: 0.2857142857142857, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.2857142857142857, 23: 1.0, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 0.2, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.6616269841269842}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 6: Valset pareto front aggregate score: 0.8658730158730158
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {0, 1, 2, 3, 4}, 2: {1, 2, 4}, 3: {0, 1, 2, 4}, 4: {2, 4}, 5: {0, 2, 4}, 6: {0, 1, 2, 3, 4}, 7: {0, 1, 2, 3, 4}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3}, 13: {0, 1}, 14: {1, 2, 3, 4}, 15: {0, 2, 4}, 16: {0, 1, 2, 4}, 17: {0, 1}, 18: {0, 1, 2, 3}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {0, 1, 2, 4}, 25: {1}, 26: {0, 1, 2, 3}, 27: {3}, 28: {3}, 29: {4}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 6: Best valset aggregate score so far: 0.6971520146520146
+Iteration 6: Best program as per aggregate score on valset: 1
+Iteration 6: Best score on valset: 0.6971520146520146
+Iteration 6: Linear pareto front program index: 1
+Iteration 6: New program candidate index: 4
+Iteration 7: Selected program 2 score: 0.6569444444444444
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Important corrections learned from prior failures:
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+ - Example: āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`.
+ - Do not stop at `notifications` when Slack, WhatsApp, chat history, chat app delivery, or chat target channels are central.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+ - ānetwork conformance checksā should include `security`.
+ - āMCP conformance checksā should include `mcp_tooling`.
+ - Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+ - Do not map āmodel conformance checksā to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.
+- Provider catalog metadata is `local_model_providers`.
+ - If a missing provider catalog entry causes a silent failure instead of a clear error, also include `reliability`.
+ - Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.
+- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 7: New subsample score 2.5357142857142856 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.5029318488529015 (coverage 30 / 30)
+Iteration 7: Val aggregate for new program: 0.5029318488529015
+Iteration 7: Individual valset scores for new program: {0: 0.5, 1: 0.2857142857142857, 2: 0.15384615384615385, 3: 1.0, 4: 0.15384615384615385, 5: 1.0, 6: 0.25, 7: 1.0, 8: 0.0, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.10526315789473684, 24: 0.5, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.125, 29: 0.2857142857142857}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.5029318488529014}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 7: Valset pareto front aggregate score: 0.8670634920634921
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5}, 4: {2, 4}, 5: {0, 2, 4, 5}, 6: {0, 1, 2, 3, 4}, 7: {0, 1, 2, 3, 4, 5}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3}, 13: {0, 1, 5}, 14: {1, 2, 3, 4, 5}, 15: {0, 2, 4, 5}, 16: {0, 1, 2, 4, 5}, 17: {0, 1}, 18: {0, 1, 2, 3, 5}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {0, 1, 2, 4, 5}, 25: {1}, 26: {0, 1, 2, 3, 5}, 27: {3, 5}, 28: {3}, 29: {4}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 7: Best valset aggregate score so far: 0.6971520146520146
+Iteration 7: Best program as per aggregate score on valset: 1
+Iteration 7: Best score on valset: 0.6971520146520146
+Iteration 7: Linear pareto front program index: 1
+Iteration 7: New program candidate index: 5
+Iteration 8: Selected program 3 score: 0.46397435897435896
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Use only allowed topic ids. Never invent topic ids. If no allowed topic is central, return an empty array.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, packaging_deployment
+
+Decision process:
+1. Read the title first.
+2. Identify the user-visible subject of the issue or PR.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.
+5. Use 3 topics only when the title explicitly contains three central concerns.
+6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.
+
+General rules:
+- Prefer the smallest complete topic set.
+- Do not add topics because a word appears incidentally.
+- Do not add broad umbrella topics when a narrower central topic already captures the issue.
+- Do not add implementation-detail topics from examples, stack traces, package names, file paths, labels, or test names.
+- If a topic is not allowed, do not output it.
+
+Known mappings and corrections:
+- Desktop Chrome automation, browser profiles, browser attach, Google sign-in through Chrome, and use of existing Chrome browser sessions are `browser_automation`.
+- In browser automation titles, the word āsessionsā often means Chrome/user-profile browser sessions. Do not classify that as `sessions` unless OpenClaw chat/session ownership or lifecycle is central.
+- `Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` is `browser_automation` and `packaging_deployment`.
+- `spawn npx ENOENT`, missing executables, broken runtime packaging, missing installed command, desktop packaging failures, dependency launch failures, installation/runtime environment failures, and app deployment breakage are `packaging_deployment`.
+- Do not add `reliability` merely because something āfailsā. Add `reliability` only for central exception/crash/hang/deadlock/stuck/race/timeout/robustness failures.
+- A title naming `spawn npx ENOENT` in a desktop/browser attach flow is primarily packaging/deployment, not reliability.
+
+Memory and self-hosted inference:
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` is `memory` and `self_hosted_inference`.
+- Use `memory` when memory, embeddings, recall, vector storage, or memory retrieval is the central feature.
+- Use `self_hosted_inference` when the title is about OpenAI-compatible providers for self-hosted servers or self-hosted inference endpoints.
+- Do not add `local_models` just because examples include llama.cpp, Ollama, vLLM, TGI, or LocalAI. Treat those names as examples unless local model management itself is the central user-visible subject.
+- Do not add `local_model_providers` or `model_serving` when the titleās central concern is an OpenAI-compatible provider for self-hosted inference and another explicit feature like memory already accounts for the context.
+
+Cron/session/reliability correction:
+- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.
+- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.
+- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.
+- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.
+- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.
+
+Security/CI correction:
+- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.
+- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.
+- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is allowed only for packaging/deployment/install/runtime environment concerns, not generic CI hardening.
+
+Suppression rules:
+- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.
+- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.
+- Do not add `sessions` for browser sessions, Chrome sessions, login sessions, profile attachment, or user profiles unless OpenClaw session ownership/lifecycle is the central subject.
+- Do not add `reliability` for every bug. Add it only when the titleās central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.
+- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.
+
+Output raw JSON only.
+Iteration 8: New subsample score 4.0 is better than old score 1.0714285714285714. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.4785409035409035 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.4785409035409035
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.2, 2: 0.25, 3: 0.0, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.2, 9: 0.5, 10: 0.3333333333333333, 11: 0.5, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 1.0, 25: 0.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.25}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.47854090354090345}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 8: Valset pareto front aggregate score: 0.8837301587301588
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5}, 4: {2, 4}, 5: {0, 2, 4, 5}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 1, 2, 3, 4, 5, 6}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6}, 13: {0, 1, 5}, 14: {1, 2, 3, 4, 5, 6}, 15: {0, 2, 4, 5}, 16: {0, 1, 2, 4, 5}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 8: Best valset aggregate score so far: 0.6971520146520146
+Iteration 8: Best program as per aggregate score on valset: 1
+Iteration 8: Best score on valset: 0.6971520146520146
+Iteration 8: Linear pareto front program index: 1
+Iteration 8: New program candidate index: 6
+Iteration 9: Selected program 5 score: 0.5029318488529015
+Iteration 9: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => `cron_automation`, `sessions`, `reliability`.
+- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the titleās real subject is that areaās contract or initialization behavior.
+- Example: `sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`, not `reliability`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+- If an āannounce deliveryā title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => no `notifications`.
+
+Cron automation:
+- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.
+- Do not suppress `cron_automation` merely because the title contains āannounceā or ādeliveryā.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => include `cron_automation`.
+
+Sessions:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, or user actively chatting in a session is central.
+- Example: `EmbeddedAttemptSessionTakeoverError when user is actively chatting` => include `sessions`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+- Example: āfix(bundle-mcp): coerce stringified object/array params before MCP tool callsā => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, and approvals:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.
+- Example: ā[Feature]: Per-binding and per-agent permissionMode for ACP sessionsā => `acp`, `approvals`, `acpx`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+- Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`.
+- In that example, do not add `reliability`; āsilently breaksā is secondary wording and the maintainer buckets are sandboxing plus agent/subagent runtime.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+- Example: āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+- ānetwork conformance checksā should include `security`.
+- āMCP conformance checksā should include `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+- Do not map āmodel conformance checksā to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+- Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 9: New subsample score 3.5 is better than old score 1.45. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5265246449456976 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5265246449456976
+Iteration 9: Individual valset scores for new program: {0: 0.5, 1: 0.0, 2: 0.25, 3: 1.0, 4: 0.25, 5: 1.0, 6: 1.0, 7: 0.25, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.10526315789473684, 24: 0.5, 25: 0.5, 26: 0.2857142857142857, 27: 0.1, 28: 0.5, 29: 0.25}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5265246449456975}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 9: Valset pareto front aggregate score: 0.905952380952381
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7}, 4: {2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6}, 13: {0, 1, 5, 7}, 14: {1, 2, 3, 4, 5, 6, 7}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7}, 22: {1, 3}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 9: Best valset aggregate score so far: 0.6971520146520146
+Iteration 9: Best program as per aggregate score on valset: 1
+Iteration 9: Best score on valset: 0.6971520146520146
+Iteration 9: Linear pareto front program index: 1
+Iteration 9: New program candidate index: 7
+Iteration 10: Selected program 7 score: 0.5265246449456976
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, labels, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Important routing rules:
+- Use `acp` for ACP protocol/session behavior, ACP configured bindings, ACP session keys, ACP channel mapping, or ACP process/session reuse.
+- Use `sessions` when session keys, persistent sessions, shared sessions, thread/session identity, session takeover, or session lifecycle are central.
+- Do NOT add `acpx` merely because the item involves ACP bindings. Treat `acpx` as label spam unless the title explicitly centers ACPX extension/adapter behavior, per-binding/per-agent ACPX behavior, or names ACPX as the contract being changed.
+- Example: āACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code processā => `acp`, `sessions`, not `acpx`.
+
+Model serving:
+- Use `model_serving` for OpenAI-compatible serving, streaming responses, model-server compatibility, final chunks, llama.cpp server behavior, endpoint streaming contracts, routing, or serving protocol behavior.
+- A title like āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is `model_serving`.
+- Do not output zero topics for serving/streaming issues just because āusageā appears.
+- Suppress `telemetry_usage` when āusageā is only the affected value caused by a serving stream/chunk bug. Add `telemetry_usage` only when usage tracking, usage accounting, reporting, or metrics collection is itself the central maintainer-owned subject.
+
+Security, hooks, and skills/plugins:
+- Use `security` for vulnerabilities, SSRF, token verification, credential boundaries, access restrictions, network policy, HIGH/CRITICAL vuln fixes, or allowed/blocked security behavior.
+- Use `hooks` when hook priority, hook execution, hook ordering, hook policy, or hook behavior is central.
+- Use `skills_plugins` when skills, plugins, skill scanners, plugin scanners, skill/plugin installation, skill/plugin metadata, or skill/plugin runtime behavior are central.
+- If a security title explicitly names multiple vulnerable areas, include each central area named.
+- Example: āsecurity: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verificationā => `security`, `hooks`, `skills_plugins`.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.
+- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the titleās real subject is that areaās contract or initialization behavior.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+- If an āannounce deliveryā title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.
+
+Cron automation:
+- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.
+- Do not suppress `cron_automation` merely because the title contains āannounceā or ādeliveryā.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, skill, scanner, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, binding, ACPX, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 10: New subsample score 4.0 is better than old score 1.7857142857142856. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.4836904761904762 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.4836904761904762
+Iteration 10: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25, 4: 1.0, 5: 0.25, 6: 0.2857142857142857, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 0.25, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.25, 24: 0.5, 25: 0.0, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.25}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.48369047619047617}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 10: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6, 8}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7}, 14: {1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6, 8}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 10: Best valset aggregate score so far: 0.6971520146520146
+Iteration 10: Best program as per aggregate score on valset: 1
+Iteration 10: Best score on valset: 0.6971520146520146
+Iteration 10: Linear pareto front program index: 1
+Iteration 10: New program candidate index: 8
+Iteration 11: Selected program 5 score: 0.5029318488529015
+Iteration 11: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important correction for ACP / ACPX / sessions:
+- Titles about āExplicit ACP agent runsā are centrally about `acpx`.
+- When the same title also names ACP behavior, embedded ACP backend visibility, ACP agent backend state, ACP run results, or ACP final JSON state, include `acp`.
+- Do not add `sessions` just because the title mentions `sessions_yield`, yielding, stale state after a yield, or a session-related event boundary.
+- Treat `sessions_yield` as an implementation/event-boundary clue unless the actual subject is session lifecycle, persistence, resume, listing, storage, or session management.
+- Example: āExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yieldā => `acpx`, `acp`; not `sessions`.
+- āfinal JSON stateā in an ACP/ACPX title is usually part of the ACP/ACPX result contract, not a general sessions issue.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.
+- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+- Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, session, sessions_yield, yield, or final JSON. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 11: New subsample score 3.5 is better than old score 3.25. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.4776587301587301 (coverage 30 / 30)
+Iteration 11: Val aggregate for new program: 0.4776587301587301
+Iteration 11: Individual valset scores for new program: {0: 0.25, 1: 0.0, 2: 0.14285714285714285, 3: 1.0, 4: 0.25, 5: 0.5, 6: 0.25, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 0.2, 29: 1.0}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.47765873015873}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 11: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7, 9}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7, 9}, 10: {0, 2}, 11: {0, 1, 4, 9}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7, 9}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6, 8}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 11: Best valset aggregate score so far: 0.6971520146520146
+Iteration 11: Best program as per aggregate score on valset: 1
+Iteration 11: Best score on valset: 0.6971520146520146
+Iteration 11: Linear pareto front program index: 1
+Iteration 11: New program candidate index: 9
+Iteration 12: Selected program 1 score: 0.6971520146520146
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, packaging_deployment
+
+Classification rules:
+- Read the title first. Use body or summary only if the title is ambiguous.
+- Choose one primary topic.
+- Add secondary topics only when they are explicit central maintainer-owned subjects.
+- Use the minimum complete topic set. Do not add topics for labels, examples, paths, tests, implementation details, or broad ownership words.
+- Never invent topic ids.
+
+Important edge cases:
+- If a provider catalog/model catalog issue says a missing model causes āsilent failureā, āfails silentlyā, āhangā, ācrashā, āno errorā, or similar robustness behavior, include `reliability` in addition to the provider/catalog topic.
+ Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+- Use `acpx` when ACPX is explicitly named as the subject. Do not normalize ACPX to `acp`.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+ Do not include `acp` there unless ACP itself is separately central.
+
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, Superpowers, plugin discovery, plugin installation, or plugin availability is central.
+
+- Use `codex` when Codex is named as a central product/runtime subject, not merely as bracket metadata.
+
+- Use `browser_automation` for Chrome/Chromium session reuse, browser profile attach, browser diagnostics, browser runtime behavior, Google sign-in through browser automation, or browser tooling issues.
+
+- Use `packaging_deployment` for install/runtime environment problems such as missing executables, PATH/package manager availability, `npx`/Node spawn failures, desktop app packaging, bundled runtime problems, or deployment/startup environment failures.
+ Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Topic reminders:
+- `local_model_providers`: provider-specific model catalogs, known model metadata, provider-backed model support.
+- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, request/response streaming, usage chunks, endpoint compatibility, model request routing.
+- `local_models`: LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local model apps/providers.
+- `reliability`: deadlock, hang, crash, race, liveness, timeout, stuck state, silent failure, robustness failure, or missing error behavior.
+- `acp`: ACP is centrally named.
+- `acpx`: ACPX is explicitly named or clearly ACPX-specific.
+- `approvals`: approval mode, permission mode, user approval behavior, permission policy.
+- `sessions`: session identity, lifecycle, routing, state, persistent process identity.
+- `gateway`: gateway routing, guarded gateway behavior, gateway-owned behavior, gateway send denial.
+- `exec_tools`: exec command/tool behavior, PATH fallback, exec contract behavior.
+- `sandboxing`: sandbox policy or sandbox runtime behavior.
+- `security`: vulnerabilities, credentials, network policy, access restrictions, outbound rules.
+- `notifications`: notification delivery, announce messages, heartbeat pushes.
+- `chat_integrations`: Slack, WhatsApp, chat app delivery, chat history, target channels.
+- `cron_automation`: cron scheduling, force-run, lifecycle, execution, or cron deadlock.
+- `api_surface`: exposed API/contract, reader contract, request/response shape, public integration behavior.
+- `ui_tui`: webchat, TUI, terminal UI, user-facing chat interface.
+- `docs`: documentation-only changes, tutorials, README, guides, examples, showcase additions.
+
+Final suppression check:
+Before outputting, remove any topic added only because a related word appeared. Keep it only if it is actually a central user-visible issue, feature, policy, or contract.
+Iteration 12: New subsample score 4.0 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.56900327742433 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.56900327742433
+Iteration 12: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.2, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.5, 18: 0.2857142857142857, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.10526315789473684, 28: 0.5, 29: 0.15384615384615385}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.56900327742433}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 12: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 10, 4}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7, 10}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10}, 9: {0, 1, 2, 4, 5, 7, 9}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7, 9, 10}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10}, 26: {0, 1, 2, 3, 5, 6, 8, 10}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 12: Best valset aggregate score so far: 0.6971520146520146
+Iteration 12: Best program as per aggregate score on valset: 1
+Iteration 12: Best score on valset: 0.6971520146520146
+Iteration 12: Linear pareto front program index: 1
+Iteration 12: New program candidate index: 10
+Iteration 13: Selected program 4 score: 0.6616269841269842
+Iteration 13: All subsample scores perfect. Skipping.
+Iteration 13: Reflective mutation did not propose a new candidate
+Iteration 14: Selected program 7 score: 0.5265246449456976
+Iteration 14: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first. The title is the primary signal.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, broad ownership area, or incidental UI wording.
+
+Important anti-label-spam rules:
+- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.
+- Use `ui_tui` only when the terminal UI, UI rendering behavior, TUI layout, or interactive UI surface is itself the central subject.
+- āshowing stale token countsā is not automatically `ui_tui`; classify the underlying token/session usage contract instead.
+- ādiagnosticsā is not automatically `ui_tui`; classify the subsystem whose diagnostics are central.
+
+Specific mapping rules learned from examples:
+- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness should usually include `sessions` and `telemetry_usage`.
+ Example: āfix: respect totalTokensFresh flag to avoid showing stale token countsā => `sessions`, `telemetry_usage`, not `ui_tui`.
+
+- Exec PATH fallback or command execution PATH resolution is `exec_tools`.
+ Example facet: āresolve exec PATH fallbackā => `exec_tools`.
+
+- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior is `browser_automation`, not `ui_tui`.
+ Example facet: ālayered browser diagnosticsā => `browser_automation`.
+
+- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution are `cron_automation`.
+ Example facet: ācron force-run deadlockā => `cron_automation`.
+ Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.
+
+- ACP protocol/session behavior is `acp`.
+- Gateway-owned behavior is `gateway`.
+- Node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, or runtime setup is `agent_runtime`.
+ Example: āACP: add gateway-owned node-backed runtimeā => `acp`, `gateway`, `agent_runtime`, not `acpx`.
+
+- Do not use `acpx` just because ACP is present. Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the titleās real subject is that areaās contract, initialization, or execution behavior.
+- If the title names three concrete subsystem facets, prefer those three concrete buckets over replacing one with `reliability`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+
+Sessions:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.
+- If token counts are tied to a session state flag such as `totalTokensFresh`, include both `sessions` and `telemetry_usage`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, and approvals:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, node-backed runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, or execution is itself central.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+- Network conformance checks should include `security`.
+- MCP conformance checks should include `mcp_tooling`.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 14: New subsample score 3.25 is better than old score 1.7. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.5957763532763533 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.5957763532763533
+Iteration 14: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 0.2857142857142857, 4: 0.1111111111111111, 5: 1.0, 6: 1.0, 7: 0.2857142857142857, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.25, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.5, 20: 1.0, 21: 1.0, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 0.5, 25: 1.0, 26: 0.2857142857142857, 27: 0.125, 28: 0.25, 29: 0.15384615384615385}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.5957763532763531}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 14: Valset pareto front aggregate score: 0.9476190476190476
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11}, 1: {5}, 2: {1, 2, 4, 10, 11}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8, 11}, 13: {0, 1, 5, 7, 9, 10, 11}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11}, 19: {7}, 20: {11}, 21: {1, 3, 4, 7, 8, 11}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11}, 26: {0, 1, 2, 3, 5, 6, 8, 10}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 14: Best valset aggregate score so far: 0.6971520146520146
+Iteration 14: Best program as per aggregate score on valset: 1
+Iteration 14: Best score on valset: 0.6971520146520146
+Iteration 14: Linear pareto front program index: 1
+Iteration 14: New program candidate index: 11
+Iteration 15: Selected program 11 score: 0.5957763532763533
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Process:
+1. Read the title first. The title is the primary signal.
+2. Identify the central user-visible bug, feature, policy, contract, or documentation change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output raw JSON only. No Markdown, explanations, comments, or extra fields.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, transport mode, broad ownership area, or incidental wording.
+
+Ignore label prefixes:
+- Ignore title prefixes like `[codex]`, `fix:`, `Policy:`, `[Bug]:`, etc. Do not classify as `codex` merely because `[codex]` appears.
+
+Model and serving rules:
+- Use `model_serving` when the serving/runtime response path, model output handling, response streaming, thinking block handling, Responses API serving behavior, or served model behavior is central.
+- Use `local_models` when local models or local model behavior are central.
+- Use `local_model_providers` only when provider catalog metadata, provider-specific integration, missing provider models, provider configuration, or known provider/model-family metadata is central.
+- Do NOT use `local_model_providers` merely because a provider name appears as context.
+- āLM Studio thinking blocks invisible with Responses APIā => `model_serving`, `local_models`, not `local_model_providers`.
+- Do not use `api_surface` merely because āResponses APIā appears as the transport or compatibility mode. Use `api_surface` only when the API contract, endpoint behavior, reader API, schema, or externally consumed API surface is central.
+
+Policy, config, and security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+- Do NOT use `local_model_providers` merely because a policy title says āmodelā.
+
+Gateway, API, and UI:
+- Use `gateway` when gateway-owned behavior, gateway routing, webchat gateway behavior, or gateway-backed readers are central.
+- Use `api_surface` when an externally consumed API, reader contract, endpoint, request/response contract, or API-visible behavior is central.
+- Use `ui_tui` when a terminal UI, webchat UI, UI rendering behavior, full-message reader UI, TUI layout, interactive UI surface, or message reader surface is itself central.
+- āwebchat full-message reader for truncated historyā => `gateway`, `api_surface`, `ui_tui`.
+- Do NOT suppress `ui_tui` when the titleās central subject is a reader/viewer/message-reading UI surface.
+- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes are both `notifications` and `chat_integrations` when the title is about pushed Slack messages.
+- Do NOT classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, force-run behavior, deadlock, or execution is itself central.
+- āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`, not `cron_automation`.
+
+Sessions and telemetry:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.
+- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.
+- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness usually include both `sessions` and `telemetry_usage`.
+- ārespect totalTokensFresh flag to avoid showing stale token countsā => `sessions`, `telemetry_usage`, not `ui_tui`.
+
+Execution, browser, and cron:
+- Exec PATH fallback or command execution PATH resolution => `exec_tools`.
+- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior => `browser_automation`, not `ui_tui`.
+- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution => `cron_automation`.
+- Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the titleās real subject is that areaās contract, initialization, rendering, or execution behavior.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, approvals, and runtime:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.
+- Do not use `acpx` just because ACP is present.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope => `acp`, `approvals`, `acpx`.
+- Use `agent_runtime` for node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, runtime setup, agent initialization, or spawning.
+- āACP: add gateway-owned node-backed runtimeā => `acp`, `gateway`, `agent_runtime`, not `acpx`.
+
+Coding agents and sandboxing:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, provider, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, API, Responses API, LM Studio, or contract. Keep the topic only if it is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 1.2357142857142858. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6629365079365079 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6629365079365079
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.25, 4: 0.0, 5: 1.0, 6: 1.0, 7: 0.5, 8: 1.0, 9: 0.5, 10: 0.2, 11: 0.2, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 0.5, 20: 1.0, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.16666666666666666, 28: 0.25, 29: 0.2857142857142857}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6629365079365079}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.16666666666666666, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 15: Valset pareto front aggregate score: 0.9722222222222222
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11, 12}, 1: {12}, 2: {1, 2, 4, 10, 11, 12}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11, 12}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11, 12}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10, 12}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8, 11, 12}, 13: {0, 1, 5, 7, 9, 10, 11, 12}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11, 12}, 17: {0, 1, 12}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11}, 19: {7}, 20: {11, 12}, 21: {1, 3, 4, 7, 8, 11, 12}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11, 12}, 26: {0, 1, 2, 3, 5, 6, 8, 10, 12}, 27: {12}, 28: {3, 6}, 29: {9, 4}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 15: Best valset aggregate score so far: 0.6971520146520146
+Iteration 15: Best program as per aggregate score on valset: 1
+Iteration 15: Best score on valset: 0.6971520146520146
+Iteration 15: Linear pareto front program index: 1
+Iteration 15: New program candidate index: 12
+Iteration 16: Selected program 2 score: 0.6569444444444444
+Iteration 16: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important correction:
+For `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as:
+`reliability`, `auth_identity`, and `sessions`.
+
+Why:
+- `self-heal lane wedges` is a reliability/liveness/robustness issue.
+- Do NOT classify `lane wedges` as `queueing`; ālaneā is an implementation detail unless queueing, queued execution, queue lifecycle, or scheduling behavior is explicitly the user-visible subject.
+- `OAuth` is an auth and identity concern, so use `auth_identity`.
+- `openai-codex OAuth` does NOT imply `codex`; here ācodexā is part of the OAuth/provider naming, not the central maintainer-owned subject.
+- `embedded path` makes the OAuth restoration session/path-specific, so include `sessions`.
+
+Auth and identity:
+- Use `auth_identity` for OAuth, login identity, account identity, authentication flows, token identity, credential identity, or restoring auth behavior.
+- OAuth fixes should usually include `auth_identity`.
+- Add `sessions` when the auth problem is tied to a session path, embedded path, outbound session identity, session lifecycle, session routing, or session-specific behavior.
+- Do not add `codex` merely because an OAuth provider or path name contains ācodexā.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, wedge, stuck state, race, timeout, liveness issue, self-healing behavior, or robustness failure.
+- Words like `self-heal`, `wedge`, `wedged`, `stuck`, and `recover` are strong reliability signals when the title is about restoring forward progress.
+- Suppress `reliability` only when the failure mode clearly belongs to a more specific named feature owner, such as `cron force-run deadlock` being `cron_automation`.
+
+Queueing:
+- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.
+- Do not add `queueing` for incidental words like lane, main lane, worker, subagent, execution path, or internal routing path.
+- A lane wedge is not automatically queueing.
+
+Codex:
+- Use `codex` when Codex itself is the central user-visible subject, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title contains ācodexā inside a provider name, OAuth label, bracketed metadata, or path name.
+- If the real subject is OAuth, classify as `auth_identity`, not `codex`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like lane, queue, codex, OAuth, embedded, path, wedge, session, usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, deadlock, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 16: New subsample score 4.0 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 16: Valset score for new program: 0.49142857142857144 (coverage 30 / 30)
+Iteration 16: Val aggregate for new program: 0.49142857142857144
+Iteration 16: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.25, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.0, 14: 1.0, 15: 0.5, 16: 0.0, 17: 0.25, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 16: Objective aggregate scores for new program: {'weighted_score': 0.4914285714285713}
+Iteration 16: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 16: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 16: Valset pareto front aggregate score: 1.0
+Iteration 16: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11, 12}, 1: {12}, 2: {1, 2, 4, 10, 11, 12}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11, 12}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11, 12, 13}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 13}, 8: {0, 1, 2, 4, 10, 12}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10, 13}, 12: {0, 1, 2, 3, 6, 8, 11, 12, 13}, 13: {0, 1, 5, 7, 9, 10, 11, 12}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11, 12}, 17: {0, 1, 12}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11, 13}, 19: {7}, 20: {11, 12}, 21: {1, 3, 4, 7, 8, 11, 12}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11, 12}, 26: {0, 1, 2, 3, 5, 6, 8, 10, 12, 13}, 27: {13}, 28: {3, 6}, 29: {9, 4}}
+Iteration 16: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 16: Best valset aggregate score so far: 0.6971520146520146
+Iteration 16: Best program as per aggregate score on valset: 1
+Iteration 16: Best score on valset: 0.6971520146520146
+Iteration 16: Linear pareto front program index: 1
+Iteration 16: New program candidate index: 13
+{
+ "best_idx": 1,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md",
+ "best_score": 0.6971520146520146,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 16,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 11180,
+ "seed_routing_policy_sha256": "288c716272a98543b9c5d7c191ac9389173adddefd17a7fba409a2e9d6b9ff65"
+ },
+ "created_at": "2026-06-13T21:47:03.393070+00:00",
+ "num_candidates": 14,
+ "num_full_val_evals": 14,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json",
+ "total_metric_calls": 544
+}
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.controller.log b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.controller.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.controller.pid b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.controller.pid
new file mode 100644
index 0000000000000000000000000000000000000000..a3b2846391cc8e82ecb7168eb3f2d05fd3ecc6c0
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.controller.pid
@@ -0,0 +1 @@
+572348
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.log b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.pid b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.pid
new file mode 100644
index 0000000000000000000000000000000000000000..b234dbe708f59c9dc833d0077db56cf33b426f9e
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.pid
@@ -0,0 +1 @@
+35480
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.sh b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.sh
new file mode 100644
index 0000000000000000000000000000000000000000..63197c7b954e65e6b5d408c0a2807371dd6a0d8d
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.sh
@@ -0,0 +1,154 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT=/home/bob/repos/localpager
+cd "$ROOT"
+
+PY=prompt-optimizer/.venv/bin/python
+export PYTHONPATH=prompt-optimizer/src
+
+CURRENT_RUN_DIR=/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z
+MAX_CONTINUATIONS=${MAX_CONTINUATIONS:-3}
+CONTINUATIONS=0
+
+write_status() {
+ local run_dir=$1
+ shift
+ {
+ printf 'current_run_dir=%s\n' "$run_dir"
+ printf '%s\n' "$@"
+ } > "$run_dir/post_run_queue.status"
+}
+
+wait_for_run() {
+ local run_dir=$1
+ while [[ ! -s "$run_dir/summary.json" ]]; do
+ if [[ -f "$run_dir/optimize.pid" ]] && kill -0 "$(cat "$run_dir/optimize.pid")" 2>/dev/null; then
+ "$PY" -m prompt_optimizer.cli plot-run --run-dir "$run_dir" >/dev/null || true
+ sleep 300
+ else
+ break
+ fi
+ done
+}
+
+summarize_run() {
+ local run_dir=$1
+ "$PY" -m prompt_optimizer.cli plot-run --run-dir "$run_dir" >/dev/null || true
+ "$PY" -m prompt_optimizer.cli report-run --run-dir "$run_dir" > "$run_dir/final_run_report.json"
+}
+
+read_gate_counts() {
+ local report_path=$1
+ "$PY" - "$report_path" <<'PY'
+import json
+import sys
+
+with open(sys.argv[1], encoding="utf-8") as fh:
+ report = json.load(fh)
+run_log = report.get("run_log") or {}
+summary = report.get("summary") or {}
+result = report.get("result") or {}
+print(
+ run_log.get("proposal_attempts") or 0,
+ run_log.get("accepted_full_eval_candidates") or 0,
+ summary.get("num_candidates") or result.get("num_candidates") or 0,
+)
+PY
+}
+
+launch_continuation() {
+ local previous_run_dir=$1
+ local stamp
+ stamp=$(date -u +%Y%m%dT%H%M%SZ)
+ local new_run_dir=/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-"$stamp"
+ mkdir -p "$new_run_dir"
+ printf '%s\n' "$new_run_dir" > /home/bob/repos/localpager/prompt-optimizer/out/latest-gepa-proper-continuation.txt
+ cat > "$new_run_dir/command.sh" < "$new_run_dir/optimize.stdout.json" 2> "$new_run_dir/optimize.stderr.log" &
+ printf '%s\n' "$!" > "$new_run_dir/optimize.pid"
+ printf '%s\n' "$new_run_dir"
+}
+
+run_validation() {
+ local run_dir=$1
+ local stamp
+ stamp=$(date -u +%Y%m%dT%H%M%SZ)
+ local val_dir=/home/bob/repos/localpager/prompt-optimizer/out/validation-12b-row30-prop16-best-"$stamp"
+ mkdir -p "$val_dir"
+ "$PY" -m prompt_optimizer.cli evaluate-candidate \
+ --harness localpager-agent \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --routing-policy "$run_dir/best.routing_policy.md" \
+ --candidate-name gepa-12b-row30-prop16-best \
+ --limit 60 \
+ --offset 0 \
+ > "$val_dir/gepa-12b-row30-prop16-best-limit60.json" \
+ 2> "$val_dir/validation.stderr.log" &
+ printf '%s\n' "$!" > "$val_dir/validation.pid"
+ wait "$(cat "$val_dir/validation.pid")"
+ "$PY" -m prompt_optimizer.cli summarize-evaluation \
+ --evaluation "$val_dir/gepa-12b-row30-prop16-best-limit60.json" \
+ > "$val_dir/summary.json"
+ write_status "$run_dir" \
+ 'stopped_without_validation=0' \
+ "validation_dir=$val_dir"
+}
+
+while true; do
+ wait_for_run "$CURRENT_RUN_DIR"
+ summarize_run "$CURRENT_RUN_DIR"
+ read -r PROPOSALS ACCEPTED CANDIDATES < <(read_gate_counts "$CURRENT_RUN_DIR/final_run_report.json")
+
+ if (( PROPOSALS >= 16 && ACCEPTED >= 6 && CANDIDATES >= 8 )); then
+ run_validation "$CURRENT_RUN_DIR"
+ exit 0
+ fi
+
+ if (( CONTINUATIONS >= MAX_CONTINUATIONS )); then
+ write_status "$CURRENT_RUN_DIR" \
+ 'stopped_without_validation=1' \
+ 'reason=max_continuations_reached' \
+ "proposal_attempts=$PROPOSALS" \
+ "accepted_full_eval_candidates=$ACCEPTED" \
+ "num_candidates=$CANDIDATES"
+ exit 0
+ fi
+
+ if [[ ! -s "$CURRENT_RUN_DIR/best.routing_policy.md" ]]; then
+ write_status "$CURRENT_RUN_DIR" \
+ 'stopped_without_validation=1' \
+ 'reason=missing_best_routing_policy' \
+ "proposal_attempts=$PROPOSALS" \
+ "accepted_full_eval_candidates=$ACCEPTED" \
+ "num_candidates=$CANDIDATES"
+ exit 0
+ fi
+
+ CONTINUATIONS=$((CONTINUATIONS + 1))
+ write_status "$CURRENT_RUN_DIR" \
+ 'stopped_without_validation=1' \
+ 'reason=launched_continuation' \
+ "proposal_attempts=$PROPOSALS" \
+ "accepted_full_eval_candidates=$ACCEPTED" \
+ "num_candidates=$CANDIDATES" \
+ "continuation_number=$CONTINUATIONS"
+ CURRENT_RUN_DIR=$(launch_continuation "$CURRENT_RUN_DIR")
+done
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.status b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.status
new file mode 100644
index 0000000000000000000000000000000000000000..41cc39be1d8c646762c4eeccf6e59aacaceef381
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/post_run_queue.status
@@ -0,0 +1,7 @@
+current_run_dir=/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z
+stopped_without_validation=1
+reason=launched_continuation
+proposal_attempts=15
+accepted_full_eval_candidates=13
+num_candidates=14
+continuation_number=1
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..66755d9a6ef3d16cb309128270229e63f156cc89
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.json
@@ -0,0 +1,777 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 3,
+ 10,
+ 19,
+ 17
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.2,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 0.25,
+ 0.2,
+ 1.0,
+ 0.2
+ ]
+ },
+ {
+ "i": 1,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 5,
+ 0,
+ 7,
+ 27
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.14285714285714285
+ ],
+ "new_subsample_scores": [
+ 0.25,
+ 0.5,
+ 0.5,
+ 1.0
+ ]
+ },
+ {
+ "i": 2,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 25,
+ 14,
+ 21,
+ 26
+ ],
+ "subsample_scores": [
+ 0.0,
+ 0.2857142857142857,
+ 0.5,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 3,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 22,
+ 2,
+ 20,
+ 4
+ ],
+ "subsample_scores": [
+ 0.0,
+ 0.14285714285714285,
+ 0.2,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.25,
+ 1.0
+ ],
+ "new_program_idx": 2,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 4,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 28,
+ 6,
+ 18,
+ 11
+ ],
+ "subsample_scores": [
+ 0.3333333333333333,
+ 1.0,
+ 0.2857142857142857,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 3,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 5,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 23,
+ 9,
+ 29,
+ 15
+ ],
+ "subsample_scores": [
+ 0.10526315789473684,
+ 1.0,
+ 0.2857142857142857,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 4,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 6,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 16,
+ 8,
+ 1,
+ 13
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 0.0,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.25,
+ 0.2857142857142857,
+ 1.0
+ ],
+ "new_program_idx": 5,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 7,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 24,
+ 12,
+ 12,
+ 24
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.2857142857142857,
+ 0.2857142857142857,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 6,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 8,
+ "selected_program_candidate": 5,
+ "subsample_ids": [
+ 28,
+ 21,
+ 5,
+ 19
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.5,
+ 0.5,
+ 0.2
+ ],
+ "new_subsample_scores": [
+ 0.5,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 7,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 9,
+ "selected_program_candidate": 7,
+ "subsample_ids": [
+ 26,
+ 4,
+ 6,
+ 22
+ ],
+ "subsample_scores": [
+ 0.2857142857142857,
+ 0.0,
+ 1.0,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 8,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 10,
+ "selected_program_candidate": 5,
+ "subsample_ids": [
+ 18,
+ 29,
+ 9,
+ 3
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.25,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 9,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 11,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 25,
+ 13,
+ 11,
+ 24
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.5,
+ 0.25,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 10,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 12,
+ "selected_program_candidate": 4,
+ "subsample_ids": [
+ 7,
+ 10,
+ 15,
+ 23
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+ },
+ {
+ "i": 13,
+ "selected_program_candidate": 7,
+ "subsample_ids": [
+ 20,
+ 12,
+ 2,
+ 0
+ ],
+ "subsample_scores": [
+ 0.2,
+ 1.0,
+ 0.25,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.25,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 11,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 14,
+ "selected_program_candidate": 11,
+ "subsample_ids": [
+ 17,
+ 1,
+ 8,
+ 16
+ ],
+ "subsample_scores": [
+ 0.2,
+ 0.25,
+ 0.5,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 12,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 15,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 27,
+ 14,
+ 14,
+ 27
+ ],
+ "subsample_scores": [
+ 0.125,
+ 1.0,
+ 1.0,
+ 0.125
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 13,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.txt b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..02f677299e1f77fcef4a2c6788d728cc645d9a06
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log.txt
@@ -0,0 +1,1624 @@
+Iteration 0: Base program full valset score: 0.6395802005012532 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.6395802005012532
+Iteration 1: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Use the title as the primary signal. Use the body or summary only when the title is ambiguous.
+
+Pick the minimum topic set that routes the item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Important domain rules:
+
+- Use `coding_agents` when the central subject is coding agents, subagents, delegated agents, agent spawning, worker agents, or coding-agent initialization behavior.
+- Use `agent_runtime` when the central subject is agent runtime behavior, runtime ownership, runtime startup, runtime initialization, node-backed runtimes, or execution-runtime lifecycle.
+- Use `sandboxing` when sandbox modes, sandbox policy, sandbox execution, or sandbox-related breakage is central.
+- A title like `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` should be classified as `coding_agents`, `sandboxing`, and `agent_runtime`.
+
+Suppression rules:
+
+- Do not classify `sessions` merely because a term contains the word `sessions`, such as `sessions_spawn`.
+- Treat `sessions_spawn` as a subagent/coding-agent spawning mechanism unless the title is explicitly about user-visible session identity, session lifecycle, session routing, or session state.
+- Do not add `sessions` for subagent initialization bugs unless session behavior itself is the central user-visible issue.
+- Do not add a topic just because a related word appears in an internal function name, path, label, implementation detail, or compound identifier.
+
+General process:
+
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+Iteration 1: New subsample score 1.65 is not better than old score 3.2, skipping
+Iteration 2: Selected program 0 score: 0.6395802005012532
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return only final JSON:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Core task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first. It is the primary signal.
+2. Identify the user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Suppress topics triggered only by labels, component names, paths, examples, internal implementation details, or broad ownership words.
+6. Output JSON only.
+
+Important domain rules:
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, stuck state, timeout, wedge, lane wedge, self-healing failure, or robustness/liveness issue.
+- A title like `self-heal lane wedges` is `reliability`.
+- Do not add `queueing` merely because the title mentions lane, main lane, worker, subagent, queue internals, or execution paths.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be central and user-visible.
+
+Auth identity:
+- Use `auth_identity` when OAuth, login identity, account identity, credentials identity, authentication state, authorization identity, or identity restoration is central.
+- A title like `restore openai-codex OAuth on embedded path` includes `auth_identity`.
+- OAuth is not merely a security topic unless the issue is about vulnerabilities, access restrictions, network boundaries, or credential exposure policy.
+
+Sessions:
+- Use `sessions` when session identity, embedded session paths, session lifecycle, session routing, session state, or session-specific behavior is central.
+- If an OAuth/login issue is specifically about an embedded path or restoring identity in an embedded/session flow, include `sessions`.
+- `restore openai-codex OAuth on embedded path` should include `sessions`.
+
+Codex:
+- Use `codex` only when Codex behavior itself is the central subject: Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because a title contains `openai-codex`, `[codex]`, or a Codex-related auth/provider label.
+- In `restore openai-codex OAuth on embedded path`, `openai-codex` is an OAuth/identity label, not enough by itself to classify as `codex`.
+
+Composite titles:
+- If a title lists independent central fixes joined by `+`, `and`, commas, or semicolons, classify each central user-visible item.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`
+ should be:
+ {"topics_of_interest":["reliability","auth_identity","sessions"]}
+- Do not add `queueing` for `lane wedges`.
+- Do not add `codex` for `openai-codex OAuth` unless the title clearly says the Codex product/runtime/setup behavior itself is the subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like lane, queue, Codex, path, OAuth, security, status, send, gateway, model, usage, tool, cron, plugin, or contract. Keep a topic only if it is actually a central maintainer-owned subject.
+
+Output only JSON.
+Iteration 2: New subsample score 2.25 is not better than old score 3.142857142857143, skipping
+Iteration 3: Selected program 0 score: 0.6395802005012532
+Iteration 3: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.
+- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.
+- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.
+- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.
+- A title like `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 3: New subsample score 4.0 is better than old score 1.0714285714285714. Continue to full eval and add to candidate pool.
+Iteration 3: Found a better program on the valset with score 0.6971520146520146.
+Iteration 3: Valset score for new program: 0.6971520146520146 (coverage 30 / 30)
+Iteration 3: Val aggregate for new program: 0.6971520146520146
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.2, 29: 0.25}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.6971520146520145}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.2, 29: 0.2857142857142857}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 3: Valset pareto front aggregate score: 0.7566758241758241
+Iteration 3: Updated valset pareto front programs: {0: {0, 1}, 1: {0, 1}, 2: {1}, 3: {0, 1}, 4: {0, 1}, 5: {0}, 6: {0, 1}, 7: {0, 1}, 8: {0, 1}, 9: {0, 1}, 10: {0}, 11: {0, 1}, 12: {0, 1}, 13: {0, 1}, 14: {1}, 15: {0}, 16: {0, 1}, 17: {0, 1}, 18: {0, 1}, 19: {0, 1}, 20: {0, 1}, 21: {1}, 22: {1}, 23: {1}, 24: {0, 1}, 25: {1}, 26: {0, 1}, 27: {0, 1}, 28: {0, 1}, 29: {0}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 3: Best valset aggregate score so far: 0.6971520146520146
+Iteration 3: Best program as per aggregate score on valset: 1
+Iteration 3: Best score on valset: 0.6971520146520146
+Iteration 3: Linear pareto front program index: 1
+Iteration 3: New program candidate index: 1
+Iteration 4: Selected program 0 score: 0.6395802005012532
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important routing corrections:
+- `layered browser diagnostics` is `browser_automation`, not `ui_tui`.
+- `cron force-run deadlock` is `cron_automation`. Do not add `reliability` just because the word deadlock appears when the named feature owner is the better maintainer bucket.
+- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.
+- Do not use `ui_tui` merely because a title says āshowingā, ādisplayā, āreaderā, or ādiagnosticsā. Use `ui_tui` only when webchat, TUI, UI views, terminal UI, or user-facing chat interface behavior is explicitly central.
+- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem when that subsystem is an allowed topic. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+- For `cron force-run deadlock`, classify as `cron_automation` without `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+ - `exec_tools`
+ - `sandboxing`
+ - `approvals`
+- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `totalTokensFresh` and stale per-session token counts are session-state/freshness concerns and should include `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.
+- Stale token counts are `telemetry_usage`.
+- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.
+- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.
+- Security fixes in a skill scanner should include `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+ACP, ACPX, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.
+- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, or `reliability` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 3.25 is better than old score 1.342857142857143. Continue to full eval and add to candidate pool.
+Iteration 4: Valset score for new program: 0.6569444444444444 (coverage 30 / 30)
+Iteration 4: Val aggregate for new program: 0.6569444444444444
+Iteration 4: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2857142857142857, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 0.5, 22: 0.2857142857142857, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 1.0, 27: 0.125, 28: 0.25, 29: 0.2857142857142857}
+Iteration 4: Objective aggregate scores for new program: {'weighted_score': 0.6569444444444443}
+Iteration 4: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.125, 28: 0.25, 29: 0.2857142857142857}
+Iteration 4: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 4: Valset pareto front aggregate score: 0.7882631257631257
+Iteration 4: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2}, 2: {1, 2}, 3: {0, 1, 2}, 4: {2}, 5: {0, 2}, 6: {0, 1, 2}, 7: {0, 1, 2}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 2}, 11: {0, 1}, 12: {0, 1, 2}, 13: {0, 1}, 14: {1, 2}, 15: {0, 2}, 16: {0, 1, 2}, 17: {0, 1}, 18: {0, 1, 2}, 19: {2}, 20: {2}, 21: {1}, 22: {1}, 23: {1}, 24: {0, 1, 2}, 25: {1}, 26: {0, 1, 2}, 27: {0, 1, 2}, 28: {2}, 29: {0, 2}}
+Iteration 4: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 4: Best valset aggregate score so far: 0.6971520146520146
+Iteration 4: Best program as per aggregate score on valset: 1
+Iteration 4: Best score on valset: 0.6971520146520146
+Iteration 4: Linear pareto front program index: 1
+Iteration 4: New program candidate index: 2
+Iteration 5: Selected program 2 score: 0.6569444444444444
+Iteration 5: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Use only allowed topic ids. Never invent a topic id, even if a title suggests a missing bucket such as packaging, deployment, CI, infrastructure, or supply chain. If no allowed topic is central, return an empty array.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Decision process:
+1. Read the title first.
+2. Identify the user-visible subject of the issue or PR.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.
+5. Use 3 topics only when the title explicitly contains three central concerns.
+6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.
+
+Cron/session/reliability correction:
+- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.
+- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.
+- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.
+- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.
+- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.
+
+Security/CI correction:
+- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.
+- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.
+- Do not add nonexistent or label-like topics such as `packaging_deployment`.
+- Do not add deployment, packaging, CI, or release topics unless they are in the allowed list; they are not allowed here.
+
+General suppression rules:
+- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.
+- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.
+- Do not add `reliability` for every bug. Add it when the titleās central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.
+- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.
+- Never output labels outside the allowed topic ids.
+
+Output raw JSON only.
+Iteration 5: New subsample score 4.0 is better than old score 2.619047619047619. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.46397435897435896 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.46397435897435896
+Iteration 5: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.15384615384615385, 3: 0.2857142857142857, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.25, 9: 0.0, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.14285714285714285, 17: 0.5, 18: 1.0, 19: 0.14285714285714285, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.15384615384615385, 25: 0.25, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.46397435897435885}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.2857142857142857}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 5: Valset pareto front aggregate score: 0.8138583638583639
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2, 3}, 2: {1, 2}, 3: {0, 1, 2}, 4: {2}, 5: {0, 2}, 6: {0, 1, 2, 3}, 7: {0, 1, 2, 3}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 2}, 11: {0, 1}, 12: {0, 1, 2, 3}, 13: {0, 1}, 14: {1, 2, 3}, 15: {0, 2}, 16: {0, 1, 2}, 17: {0, 1}, 18: {0, 1, 2, 3}, 19: {2}, 20: {2}, 21: {1, 3}, 22: {1, 3}, 23: {1, 3}, 24: {0, 1, 2}, 25: {1}, 26: {0, 1, 2, 3}, 27: {3}, 28: {3}, 29: {0, 2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 5: Best valset aggregate score so far: 0.6971520146520146
+Iteration 5: Best program as per aggregate score on valset: 1
+Iteration 5: Best score on valset: 0.6971520146520146
+Iteration 5: Linear pareto front program index: 1
+Iteration 5: New program candidate index: 3
+Iteration 6: Selected program 2 score: 0.6569444444444444
+Iteration 6: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+You may receive a GitHub target URL, title, and sometimes a body or summary.
+The title is the primary signal. Use the body only when the title is ambiguous.
+Ignore examples, tests, labels, file paths, implementation details, symptoms, broad impact, and incidental words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, label-spam keyword, or security-hardening mechanism.
+
+High-priority corrections:
+- Local Ollama embeddings failing when a proxy is enabled, including wording like āSSRF defenses ignore NO_PROXYā, should be classified as `local_models` and `self_hosted_inference`.
+- For that Ollama/proxy/NO_PROXY case, do NOT add `security`, `local_model_providers`, or `memory`. āSSRF defensesā is the mechanism or label-like context, not the central maintainer bucket, unless the title is actually about a vulnerability or security policy.
+- āExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yieldā should be classified as `acp` and `acpx`.
+- For that ACP/sessions_yield case, do NOT add `sessions`. `sessions_yield` is an ACP/ACPX workflow term or implementation detail, not necessarily a sessions maintainer concern.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including Ollama, LM Studio, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model systems.
+- Use `self_hosted_inference` when the issue is about using self-hosted inference servers such as Ollama, llama.cpp, vLLM, TGI, or LocalAI as inference providers.
+- Ollama is a strong signal for both `local_models` and `self_hosted_inference` when the issue is about using Ollama locally or through local inference.
+- Do not use `local_model_providers` for Ollama/local runtime failures unless the title is specifically about provider catalog metadata, provider-specific open-weight integration, known context windows, or named provider/model-family support.
+- Do not add `memory` merely because embeddings are mentioned. Add `memory` only when memory behavior, active-memory recall, vector stores, memory providers, or embedding-provider memory behavior is the central subject.
+
+Security:
+- Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior when that is the main user-visible subject.
+- Do not add `security` merely because a bug mentions SSRF defenses, proxy bypass behavior, NO_PROXY, blocked requests, or a security-related implementation detail.
+- Security PRs that list vulnerable areas should include `security` plus each central named affected subsystem. For example, skill scanner vulnerability work includes `skills_plugins`; hook priority vulnerability work includes `hooks`.
+
+ACP and ACPX:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings, explicit ACP agent runs, embedded backend visibility, final JSON state in ACP workflow, or ACPX session workflow concepts.
+- Do not add `sessions` merely because an ACP/ACPX title contains `sessions_yield`, āsessionā, or āstateā if the central issue is ACP/ACPX workflow behavior.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, session freshness, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `totalTokensFresh`, stale token counts, fresh/stale token-count flags, and per-session usage freshness are `sessions` plus `telemetry_usage`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+- Suppress `reliability` when deadlock/hang/crash is just the failure mode of a more specific named central feature, such as cron force-run.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings as a memory subsystem, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+- Suppress `memory` when embeddings are only part of a local model provider failure and the user-visible subject is local/self-hosted inference.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `gateway` or `ui_tui` for browser diagnostics unless gateway or UI behavior is explicitly the subject.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- `cron force-run deadlock` is `cron_automation`; do not add `reliability` unless the title separately makes reliability/liveness a central cross-cutting subject.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because a title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, token counts, stale usage values, or status reporting surfaces are themselves the feature or bug.
+- Stale token counts are `telemetry_usage`.
+- Do not add `model_serving` merely because usage chunks or token counts are mentioned in a model-serving protocol bug unless serving protocol behavior is central.
+
+Open-weight and local model providers:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+Codex:
+- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, plugin availability, or skill scanner behavior is central.
+- Security fixes in a skill scanner should include `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Model serving:
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks unless usage reporting itself is central.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Policy, config, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Documentation:
+- Documentation-only PRs should usually include `docs` plus the central documented area.
+- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+Queueing:
+- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+
+Tool calling:
+- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, or hook security is central.
+- In composite security titles, include `hooks` when hook priority or hook execution is one of the named vulnerable areas.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `skills_plugins`, `hooks`.
+- Do not substitute broad infrastructure topics like `gateway`, `ui_tui`, `reliability`, or `security` for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, embeddings, SSRF, NO_PROXY, sessions_yield, or state. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 6: New subsample score 4.0 is better than old score 2.3909774436090228. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.6616269841269842 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.6616269841269842
+Iteration 6: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 0.5, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2857142857142857, 18: 0.2857142857142857, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.2857142857142857, 23: 1.0, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 0.2, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.6616269841269842}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 6: Valset pareto front aggregate score: 0.8658730158730158
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {0, 1, 2, 3, 4}, 2: {1, 2, 4}, 3: {0, 1, 2, 4}, 4: {2, 4}, 5: {0, 2, 4}, 6: {0, 1, 2, 3, 4}, 7: {0, 1, 2, 3, 4}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3}, 13: {0, 1}, 14: {1, 2, 3, 4}, 15: {0, 2, 4}, 16: {0, 1, 2, 4}, 17: {0, 1}, 18: {0, 1, 2, 3}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {0, 1, 2, 4}, 25: {1}, 26: {0, 1, 2, 3}, 27: {3}, 28: {3}, 29: {4}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 6: Best valset aggregate score so far: 0.6971520146520146
+Iteration 6: Best program as per aggregate score on valset: 1
+Iteration 6: Best score on valset: 0.6971520146520146
+Iteration 6: Linear pareto front program index: 1
+Iteration 6: New program candidate index: 4
+Iteration 7: Selected program 2 score: 0.6569444444444444
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Important corrections learned from prior failures:
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+ - Example: āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`.
+ - Do not stop at `notifications` when Slack, WhatsApp, chat history, chat app delivery, or chat target channels are central.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+ - ānetwork conformance checksā should include `security`.
+ - āMCP conformance checksā should include `mcp_tooling`.
+ - Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+ - Do not map āmodel conformance checksā to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.
+- Provider catalog metadata is `local_model_providers`.
+ - If a missing provider catalog entry causes a silent failure instead of a clear error, also include `reliability`.
+ - Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.
+- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 7: New subsample score 2.5357142857142856 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.5029318488529015 (coverage 30 / 30)
+Iteration 7: Val aggregate for new program: 0.5029318488529015
+Iteration 7: Individual valset scores for new program: {0: 0.5, 1: 0.2857142857142857, 2: 0.15384615384615385, 3: 1.0, 4: 0.15384615384615385, 5: 1.0, 6: 0.25, 7: 1.0, 8: 0.0, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.10526315789473684, 24: 0.5, 25: 0.5, 26: 1.0, 27: 0.14285714285714285, 28: 0.125, 29: 0.2857142857142857}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.5029318488529014}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 7: Valset pareto front aggregate score: 0.8670634920634921
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5}, 4: {2, 4}, 5: {0, 2, 4, 5}, 6: {0, 1, 2, 3, 4}, 7: {0, 1, 2, 3, 4, 5}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3}, 13: {0, 1, 5}, 14: {1, 2, 3, 4, 5}, 15: {0, 2, 4, 5}, 16: {0, 1, 2, 4, 5}, 17: {0, 1}, 18: {0, 1, 2, 3, 5}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {0, 1, 2, 4, 5}, 25: {1}, 26: {0, 1, 2, 3, 5}, 27: {3, 5}, 28: {3}, 29: {4}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 7: Best valset aggregate score so far: 0.6971520146520146
+Iteration 7: Best program as per aggregate score on valset: 1
+Iteration 7: Best score on valset: 0.6971520146520146
+Iteration 7: Linear pareto front program index: 1
+Iteration 7: New program candidate index: 5
+Iteration 8: Selected program 3 score: 0.46397435897435896
+Iteration 8: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Use only allowed topic ids. Never invent topic ids. If no allowed topic is central, return an empty array.
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, packaging_deployment
+
+Decision process:
+1. Read the title first.
+2. Identify the user-visible subject of the issue or PR.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns in the title or first clear summary.
+5. Use 3 topics only when the title explicitly contains three central concerns.
+6. Remove any topic added only because of incidental words, labels, implementation details, tests, file paths, examples, or broad impact.
+
+General rules:
+- Prefer the smallest complete topic set.
+- Do not add topics because a word appears incidentally.
+- Do not add broad umbrella topics when a narrower central topic already captures the issue.
+- Do not add implementation-detail topics from examples, stack traces, package names, file paths, labels, or test names.
+- If a topic is not allowed, do not output it.
+
+Known mappings and corrections:
+- Desktop Chrome automation, browser profiles, browser attach, Google sign-in through Chrome, and use of existing Chrome browser sessions are `browser_automation`.
+- In browser automation titles, the word āsessionsā often means Chrome/user-profile browser sessions. Do not classify that as `sessions` unless OpenClaw chat/session ownership or lifecycle is central.
+- `Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` is `browser_automation` and `packaging_deployment`.
+- `spawn npx ENOENT`, missing executables, broken runtime packaging, missing installed command, desktop packaging failures, dependency launch failures, installation/runtime environment failures, and app deployment breakage are `packaging_deployment`.
+- Do not add `reliability` merely because something āfailsā. Add `reliability` only for central exception/crash/hang/deadlock/stuck/race/timeout/robustness failures.
+- A title naming `spawn npx ENOENT` in a desktop/browser attach flow is primarily packaging/deployment, not reliability.
+
+Memory and self-hosted inference:
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` is `memory` and `self_hosted_inference`.
+- Use `memory` when memory, embeddings, recall, vector storage, or memory retrieval is the central feature.
+- Use `self_hosted_inference` when the title is about OpenAI-compatible providers for self-hosted servers or self-hosted inference endpoints.
+- Do not add `local_models` just because examples include llama.cpp, Ollama, vLLM, TGI, or LocalAI. Treat those names as examples unless local model management itself is the central user-visible subject.
+- Do not add `local_model_providers` or `model_serving` when the titleās central concern is an OpenAI-compatible provider for self-hosted inference and another explicit feature like memory already accounts for the context.
+
+Cron/session/reliability correction:
+- `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting` is `cron_automation`, `sessions`, and `reliability`.
+- Use `cron_automation` when cron announce, cron scheduling, cron force-run, cron lifecycle, or cron execution is the named feature.
+- Use `sessions` when the bug involves active chat sessions, session takeover, embedded attempt session takeover, session ownership, session identity, or session lifecycle.
+- Use `reliability` when the central failure is an exception, crash, deadlock, hang, stuck state, race, timeout, or robustness failure.
+- Do not suppress `reliability` when the title names a concrete runtime exception such as `EmbeddedAttemptSessionTakeoverError`.
+
+Security/CI correction:
+- `ci: harden GitHub Actions supply-chain boundaries` is `security` only.
+- GitHub Actions supply-chain hardening, supply-chain boundaries, workflow trust boundaries, dependency provenance, and CI hardening are security concerns when no more specific allowed topic is central.
+- Do not add deployment, packaging, CI, or release topics unless an allowed topic is directly central. `packaging_deployment` is allowed only for packaging/deployment/install/runtime environment concerns, not generic CI hardening.
+
+Suppression rules:
+- Do not add `notifications` merely because delivery, announce, pushed message, or message sending appears. Use it only when notification behavior itself is the central user-visible subject.
+- Do not add `chat_integrations` merely because a user is actively chatting. Use it only for Slack, WhatsApp, chat app delivery, chat history, target channels, or chat push behavior.
+- Do not add `sessions` for browser sessions, Chrome sessions, login sessions, profile attachment, or user profiles unless OpenClaw session ownership/lifecycle is the central subject.
+- Do not add `reliability` for every bug. Add it only when the titleās central failure mode is an exception, crash, hang, deadlock, stuck state, race, timeout, liveness, or robustness issue.
+- Do not add `security` for every policy, boundary, or CI title. Add it when the title is about hardening, supply-chain boundaries, vulnerabilities, access restrictions, credentials, SSRF, token verification, network policy, or trust boundaries.
+
+Output raw JSON only.
+Iteration 8: New subsample score 4.0 is better than old score 1.0714285714285714. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.4785409035409035 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.4785409035409035
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.2, 2: 0.25, 3: 0.0, 4: 0.15384615384615385, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.2, 9: 0.5, 10: 0.3333333333333333, 11: 0.5, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 1.0, 25: 0.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 0.25}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.47854090354090345}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.3333333333333333, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 8: Valset pareto front aggregate score: 0.8837301587301588
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5}, 4: {2, 4}, 5: {0, 2, 4, 5}, 6: {0, 1, 2, 3, 4, 6}, 7: {0, 1, 2, 3, 4, 5, 6}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6}, 13: {0, 1, 5}, 14: {1, 2, 3, 4, 5, 6}, 15: {0, 2, 4, 5}, 16: {0, 1, 2, 4, 5}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6}, 19: {2, 4}, 20: {2, 4}, 21: {1, 3, 4}, 22: {1, 3}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 8: Best valset aggregate score so far: 0.6971520146520146
+Iteration 8: Best program as per aggregate score on valset: 1
+Iteration 8: Best score on valset: 0.6971520146520146
+Iteration 8: Linear pareto front program index: 1
+Iteration 8: New program candidate index: 6
+Iteration 9: Selected program 5 score: 0.5029318488529015
+Iteration 9: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => `cron_automation`, `sessions`, `reliability`.
+- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the titleās real subject is that areaās contract or initialization behavior.
+- Example: `sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`, not `reliability`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+- If an āannounce deliveryā title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => no `notifications`.
+
+Cron automation:
+- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.
+- Do not suppress `cron_automation` merely because the title contains āannounceā or ādeliveryā.
+- Example: ācron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chattingā => include `cron_automation`.
+
+Sessions:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, or user actively chatting in a session is central.
+- Example: `EmbeddedAttemptSessionTakeoverError when user is actively chatting` => include `sessions`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+- Example: āfix(bundle-mcp): coerce stringified object/array params before MCP tool callsā => `mcp_tooling`, `tool_calling`.
+
+ACP, ACPX, and approvals:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.
+- Example: ā[Feature]: Per-binding and per-agent permissionMode for ACP sessionsā => `acp`, `approvals`, `acpx`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+- Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization` => `coding_agents`, `sandboxing`, `agent_runtime`.
+- In that example, do not add `reliability`; āsilently breaksā is secondary wording and the maintainer buckets are sandboxing plus agent/subagent runtime.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+- Example: āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+- ānetwork conformance checksā should include `security`.
+- āMCP conformance checksā should include `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+- Do not map āmodel conformance checksā to `model_serving` unless serving endpoints, streaming, routing, or model-server compatibility are central.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+- Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 9: New subsample score 3.5 is better than old score 1.45. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5265246449456976 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5265246449456976
+Iteration 9: Individual valset scores for new program: {0: 0.5, 1: 0.0, 2: 0.25, 3: 1.0, 4: 0.25, 5: 1.0, 6: 1.0, 7: 0.25, 8: 0.5, 9: 1.0, 10: 0.3333333333333333, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.2, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.10526315789473684, 24: 0.5, 25: 0.5, 26: 0.2857142857142857, 27: 0.1, 28: 0.5, 29: 0.25}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5265246449456975}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 9: Valset pareto front aggregate score: 0.905952380952381
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7}, 4: {2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6}, 13: {0, 1, 5, 7}, 14: {1, 2, 3, 4, 5, 6, 7}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7}, 22: {1, 3}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 9: Best valset aggregate score so far: 0.6971520146520146
+Iteration 9: Best program as per aggregate score on valset: 1
+Iteration 9: Best score on valset: 0.6971520146520146
+Iteration 9: Linear pareto front program index: 1
+Iteration 9: New program candidate index: 7
+Iteration 10: Selected program 7 score: 0.5265246449456976
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, labels, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Important routing rules:
+- Use `acp` for ACP protocol/session behavior, ACP configured bindings, ACP session keys, ACP channel mapping, or ACP process/session reuse.
+- Use `sessions` when session keys, persistent sessions, shared sessions, thread/session identity, session takeover, or session lifecycle are central.
+- Do NOT add `acpx` merely because the item involves ACP bindings. Treat `acpx` as label spam unless the title explicitly centers ACPX extension/adapter behavior, per-binding/per-agent ACPX behavior, or names ACPX as the contract being changed.
+- Example: āACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code processā => `acp`, `sessions`, not `acpx`.
+
+Model serving:
+- Use `model_serving` for OpenAI-compatible serving, streaming responses, model-server compatibility, final chunks, llama.cpp server behavior, endpoint streaming contracts, routing, or serving protocol behavior.
+- A title like āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is `model_serving`.
+- Do not output zero topics for serving/streaming issues just because āusageā appears.
+- Suppress `telemetry_usage` when āusageā is only the affected value caused by a serving stream/chunk bug. Add `telemetry_usage` only when usage tracking, usage accounting, reporting, or metrics collection is itself the central maintainer-owned subject.
+
+Security, hooks, and skills/plugins:
+- Use `security` for vulnerabilities, SSRF, token verification, credential boundaries, access restrictions, network policy, HIGH/CRITICAL vuln fixes, or allowed/blocked security behavior.
+- Use `hooks` when hook priority, hook execution, hook ordering, hook policy, or hook behavior is central.
+- Use `skills_plugins` when skills, plugins, skill scanners, plugin scanners, skill/plugin installation, skill/plugin metadata, or skill/plugin runtime behavior are central.
+- If a security title explicitly names multiple vulnerable areas, include each central area named.
+- Example: āsecurity: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verificationā => `security`, `hooks`, `skills_plugins`.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Use `reliability` for explicit exception/error failures when the title centers on the system breaking during a user-visible workflow.
+- Suppress `reliability` when the failure mode is only a symptom of a more specific maintainer area and the titleās real subject is that areaās contract or initialization behavior.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+- If an āannounce deliveryā title is really about cron execution/session takeover/error behavior, prefer the central areas and suppress `notifications`.
+
+Cron automation:
+- Use `cron_automation` when cron scheduling, cron lifecycle, cron force-runs, cron announce execution, cron heartbeat execution, or cron-triggered workflows are central.
+- Do not suppress `cron_automation` merely because the title contains āannounceā or ādeliveryā.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, skill, scanner, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, binding, ACPX, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 10: New subsample score 4.0 is better than old score 1.7857142857142856. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.4836904761904762 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.4836904761904762
+Iteration 10: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25, 4: 1.0, 5: 0.25, 6: 0.2857142857142857, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 0.25, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.25, 24: 0.5, 25: 0.0, 26: 1.0, 27: 0.125, 28: 0.5, 29: 0.25}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.48369047619047617}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 10: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6, 8}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7}, 10: {0, 2}, 11: {0, 1, 4}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7}, 14: {1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6, 8}, 27: {3, 5, 6}, 28: {3, 6}, 29: {4}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 10: Best valset aggregate score so far: 0.6971520146520146
+Iteration 10: Best program as per aggregate score on valset: 1
+Iteration 10: Best score on valset: 0.6971520146520146
+Iteration 10: Linear pareto front program index: 1
+Iteration 10: New program candidate index: 8
+Iteration 11: Selected program 5 score: 0.5029318488529015
+Iteration 11: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output. If `final_json` exists, calling it is mandatory.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important correction for ACP / ACPX / sessions:
+- Titles about āExplicit ACP agent runsā are centrally about `acpx`.
+- When the same title also names ACP behavior, embedded ACP backend visibility, ACP agent backend state, ACP run results, or ACP final JSON state, include `acp`.
+- Do not add `sessions` just because the title mentions `sessions_yield`, yielding, stale state after a yield, or a session-related event boundary.
+- Treat `sessions_yield` as an implementation/event-boundary clue unless the actual subject is session lifecycle, persistence, resume, listing, storage, or session management.
+- Example: āExplicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yieldā => `acpx`, `acp`; not `sessions`.
+- āfinal JSON stateā in an ACP/ACPX title is usually part of the ACP/ACPX result contract, not a general sessions issue.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- A title about `cron --announce` and heartbeat target-channel Slack pushes is about `notifications` and `chat_integrations`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure mode is merely incidental to a more specific named feature, such as cron force-run deadlock.
+- Do include `reliability` when a provider catalog or integration issue causes silent failure instead of a proper error.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Example: āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If the missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+- Example: āgoogle-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of errorā => `local_model_providers`, `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, contract, session, sessions_yield, yield, or final JSON. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 11: New subsample score 3.5 is better than old score 3.25. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.4776587301587301 (coverage 30 / 30)
+Iteration 11: Val aggregate for new program: 0.4776587301587301
+Iteration 11: Individual valset scores for new program: {0: 0.25, 1: 0.0, 2: 0.14285714285714285, 3: 1.0, 4: 0.25, 5: 0.5, 6: 0.25, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.3333333333333333, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 0.25, 27: 0.125, 28: 0.2, 29: 1.0}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.47765873015873}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 11: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 4}, 3: {0, 1, 2, 4, 5, 7, 9}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9}, 8: {0, 1, 2, 4}, 9: {0, 1, 2, 4, 5, 7, 9}, 10: {0, 2}, 11: {0, 1, 4, 9}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7, 9}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {6}, 25: {1}, 26: {0, 1, 2, 3, 5, 6, 8}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 11: Best valset aggregate score so far: 0.6971520146520146
+Iteration 11: Best program as per aggregate score on valset: 1
+Iteration 11: Best score on valset: 0.6971520146520146
+Iteration 11: Linear pareto front program index: 1
+Iteration 11: New program candidate index: 9
+Iteration 12: Selected program 1 score: 0.6971520146520146
+Iteration 12: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Output JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, packaging_deployment
+
+Classification rules:
+- Read the title first. Use body or summary only if the title is ambiguous.
+- Choose one primary topic.
+- Add secondary topics only when they are explicit central maintainer-owned subjects.
+- Use the minimum complete topic set. Do not add topics for labels, examples, paths, tests, implementation details, or broad ownership words.
+- Never invent topic ids.
+
+Important edge cases:
+- If a provider catalog/model catalog issue says a missing model causes āsilent failureā, āfails silentlyā, āhangā, ācrashā, āno errorā, or similar robustness behavior, include `reliability` in addition to the provider/catalog topic.
+ Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+- Use `acpx` when ACPX is explicitly named as the subject. Do not normalize ACPX to `acp`.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+ Do not include `acp` there unless ACP itself is separately central.
+
+- Use `skills_plugins` when user-installed plugins, plugin inheritance, Superpowers, plugin discovery, plugin installation, or plugin availability is central.
+
+- Use `codex` when Codex is named as a central product/runtime subject, not merely as bracket metadata.
+
+- Use `browser_automation` for Chrome/Chromium session reuse, browser profile attach, browser diagnostics, browser runtime behavior, Google sign-in through browser automation, or browser tooling issues.
+
+- Use `packaging_deployment` for install/runtime environment problems such as missing executables, PATH/package manager availability, `npx`/Node spawn failures, desktop app packaging, bundled runtime problems, or deployment/startup environment failures.
+ Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Topic reminders:
+- `local_model_providers`: provider-specific model catalogs, known model metadata, provider-backed model support.
+- `model_serving`: serving endpoints, OpenAI-compatible protocol behavior, request/response streaming, usage chunks, endpoint compatibility, model request routing.
+- `local_models`: LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local model apps/providers.
+- `reliability`: deadlock, hang, crash, race, liveness, timeout, stuck state, silent failure, robustness failure, or missing error behavior.
+- `acp`: ACP is centrally named.
+- `acpx`: ACPX is explicitly named or clearly ACPX-specific.
+- `approvals`: approval mode, permission mode, user approval behavior, permission policy.
+- `sessions`: session identity, lifecycle, routing, state, persistent process identity.
+- `gateway`: gateway routing, guarded gateway behavior, gateway-owned behavior, gateway send denial.
+- `exec_tools`: exec command/tool behavior, PATH fallback, exec contract behavior.
+- `sandboxing`: sandbox policy or sandbox runtime behavior.
+- `security`: vulnerabilities, credentials, network policy, access restrictions, outbound rules.
+- `notifications`: notification delivery, announce messages, heartbeat pushes.
+- `chat_integrations`: Slack, WhatsApp, chat app delivery, chat history, target channels.
+- `cron_automation`: cron scheduling, force-run, lifecycle, execution, or cron deadlock.
+- `api_surface`: exposed API/contract, reader contract, request/response shape, public integration behavior.
+- `ui_tui`: webchat, TUI, terminal UI, user-facing chat interface.
+- `docs`: documentation-only changes, tutorials, README, guides, examples, showcase additions.
+
+Final suppression check:
+Before outputting, remove any topic added only because a related word appeared. Keep it only if it is actually a central user-visible issue, feature, policy, or contract.
+Iteration 12: New subsample score 4.0 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.56900327742433 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.56900327742433
+Iteration 12: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.2, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.2, 17: 0.5, 18: 0.2857142857142857, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.10526315789473684, 28: 0.5, 29: 0.15384615384615385}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.56900327742433}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.25, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 12: Valset pareto front aggregate score: 0.9226190476190476
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 4}, 1: {5}, 2: {1, 2, 10, 4}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7}, 6: {0, 1, 2, 3, 4, 6, 7, 10}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10}, 9: {0, 1, 2, 4, 5, 7, 9}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8}, 13: {0, 1, 5, 7, 9, 10}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9}, 19: {7}, 20: {2, 4}, 21: {1, 3, 4, 7, 8}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10}, 26: {0, 1, 2, 3, 5, 6, 8, 10}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 12: Best valset aggregate score so far: 0.6971520146520146
+Iteration 12: Best program as per aggregate score on valset: 1
+Iteration 12: Best score on valset: 0.6971520146520146
+Iteration 12: Linear pareto front program index: 1
+Iteration 12: New program candidate index: 10
+Iteration 13: Selected program 4 score: 0.6616269841269842
+Iteration 13: All subsample scores perfect. Skipping.
+Iteration 13: Reflective mutation did not propose a new candidate
+Iteration 14: Selected program 7 score: 0.5265246449456976
+Iteration 14: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Process:
+1. Read the title first. The title is the primary signal.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Always produce the required structured output.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, broad ownership area, or incidental UI wording.
+
+Important anti-label-spam rules:
+- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.
+- Use `ui_tui` only when the terminal UI, UI rendering behavior, TUI layout, or interactive UI surface is itself the central subject.
+- āshowing stale token countsā is not automatically `ui_tui`; classify the underlying token/session usage contract instead.
+- ādiagnosticsā is not automatically `ui_tui`; classify the subsystem whose diagnostics are central.
+
+Specific mapping rules learned from examples:
+- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness should usually include `sessions` and `telemetry_usage`.
+ Example: āfix: respect totalTokensFresh flag to avoid showing stale token countsā => `sessions`, `telemetry_usage`, not `ui_tui`.
+
+- Exec PATH fallback or command execution PATH resolution is `exec_tools`.
+ Example facet: āresolve exec PATH fallbackā => `exec_tools`.
+
+- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior is `browser_automation`, not `ui_tui`.
+ Example facet: ālayered browser diagnosticsā => `browser_automation`.
+
+- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution are `cron_automation`.
+ Example facet: ācron force-run deadlockā => `cron_automation`.
+ Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.
+
+- ACP protocol/session behavior is `acp`.
+- Gateway-owned behavior is `gateway`.
+- Node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, or runtime setup is `agent_runtime`.
+ Example: āACP: add gateway-owned node-backed runtimeā => `acp`, `gateway`, `agent_runtime`, not `acpx`.
+
+- Do not use `acpx` just because ACP is present. Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the titleās real subject is that areaās contract, initialization, or execution behavior.
+- If the title names three concrete subsystem facets, prefer those three concrete buckets over replacing one with `reliability`.
+
+Notifications:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Do not add `notifications` merely because the title mentions announce, delivery, message sending, send denial, pushed messages, or delivery plumbing.
+
+Sessions:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.
+
+Telemetry and usage:
+- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.
+- If token counts are tied to a session state flag such as `totalTokensFresh`, include both `sessions` and `telemetry_usage`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, and approvals:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` for ACPX extension/adapter behavior, especially per-binding or per-agent ACP session behavior.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope are a three-way concern: `acp`, `approvals`, and `acpx`.
+
+Coding agents and agent runtime:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `agent_runtime` for agent initialization, runtime lifecycle, spawning, embedded runtime behavior, node-backed runtime behavior, or internal runtime setup.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Chat integrations:
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes are both notification behavior and chat integration behavior when the title is about pushed Slack messages.
+- Do not classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, or execution is itself central.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Policy or conformance checks are `config` when they define or enforce governed behavior.
+- Network conformance checks should include `security`.
+- MCP conformance checks should include `mcp_tooling`.
+
+Local model providers:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+- Missing models in a provider catalog are `local_model_providers`.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output requirement:
+- If a structured output tool exists, call it with exactly:
+ {"topics_of_interest":[...]}
+- If no structured output tool exists, print raw JSON only.
+- Do not include explanations, Markdown, comments, or extra fields.
+Iteration 14: New subsample score 3.25 is better than old score 1.7. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.5957763532763533 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.5957763532763533
+Iteration 14: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 0.2857142857142857, 4: 0.1111111111111111, 5: 1.0, 6: 1.0, 7: 0.2857142857142857, 8: 0.5, 9: 1.0, 10: 0.2, 11: 0.25, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 0.2, 18: 1.0, 19: 0.5, 20: 1.0, 21: 1.0, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 0.5, 25: 1.0, 26: 0.2857142857142857, 27: 0.125, 28: 0.25, 29: 0.15384615384615385}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.5957763532763531}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 0.2857142857142857, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.14285714285714285, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 14: Valset pareto front aggregate score: 0.9476190476190476
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11}, 1: {5}, 2: {1, 2, 4, 10, 11}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8, 11}, 13: {0, 1, 5, 7, 9, 10, 11}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11}, 17: {0, 1}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11}, 19: {7}, 20: {11}, 21: {1, 3, 4, 7, 8, 11}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11}, 26: {0, 1, 2, 3, 5, 6, 8, 10}, 27: {3, 5, 6}, 28: {3, 6}, 29: {9, 4}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 14: Best valset aggregate score so far: 0.6971520146520146
+Iteration 14: Best program as per aggregate score on valset: 1
+Iteration 14: Best score on valset: 0.6971520146520146
+Iteration 14: Linear pareto front program index: 1
+Iteration 14: New program candidate index: 11
+Iteration 15: Selected program 11 score: 0.5957763532763533
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, coding_agents, api_surface,
+ui_tui, chat_integrations, skills_plugins, acp, acpx, approvals,
+agent_runtime, model_serving, local_models, self_hosted_inference,
+telemetry_usage, exec_tools, sandboxing, browser_automation, cron_automation,
+config, security, mcp_tooling, tool_calling, hooks
+
+Process:
+1. Read the title first. The title is the primary signal.
+2. Identify the central user-visible bug, feature, policy, contract, or documentation change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output raw JSON only. No Markdown, explanations, comments, or extra fields.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a label, symptom, path, implementation detail, example, internal hook, transport mode, broad ownership area, or incidental wording.
+
+Ignore label prefixes:
+- Ignore title prefixes like `[codex]`, `fix:`, `Policy:`, `[Bug]:`, etc. Do not classify as `codex` merely because `[codex]` appears.
+
+Model and serving rules:
+- Use `model_serving` when the serving/runtime response path, model output handling, response streaming, thinking block handling, Responses API serving behavior, or served model behavior is central.
+- Use `local_models` when local models or local model behavior are central.
+- Use `local_model_providers` only when provider catalog metadata, provider-specific integration, missing provider models, provider configuration, or known provider/model-family metadata is central.
+- Do NOT use `local_model_providers` merely because a provider name appears as context.
+- āLM Studio thinking blocks invisible with Responses APIā => `model_serving`, `local_models`, not `local_model_providers`.
+- Do not use `api_surface` merely because āResponses APIā appears as the transport or compatibility mode. Use `api_surface` only when the API contract, endpoint behavior, reader API, schema, or externally consumed API surface is central.
+
+Policy, config, and security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, SSRF, token verification, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- āPolicy: add model, network, and MCP conformance checksā => `config`, `security`, `mcp_tooling`.
+- Do NOT use `local_model_providers` merely because a policy title says āmodelā.
+
+Gateway, API, and UI:
+- Use `gateway` when gateway-owned behavior, gateway routing, webchat gateway behavior, or gateway-backed readers are central.
+- Use `api_surface` when an externally consumed API, reader contract, endpoint, request/response contract, or API-visible behavior is central.
+- Use `ui_tui` when a terminal UI, webchat UI, UI rendering behavior, full-message reader UI, TUI layout, interactive UI surface, or message reader surface is itself central.
+- āwebchat full-message reader for truncated historyā => `gateway`, `api_surface`, `ui_tui`.
+- Do NOT suppress `ui_tui` when the titleās central subject is a reader/viewer/message-reading UI surface.
+- Do NOT use `ui_tui` merely because the title says showing, display, stale counts, diagnostics, messages, or visible output.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes are both `notifications` and `chat_integrations` when the title is about pushed Slack messages.
+- Do NOT classify cron target-channel Slack push issues as `cron_automation` unless cron scheduling, lifecycle, force-run behavior, deadlock, or execution is itself central.
+- āPer-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushesā => `notifications`, `chat_integrations`, not `cron_automation`.
+
+Sessions and telemetry:
+- Use `sessions` when active sessions, session takeover, embedded session takeover, session spawning, session identity, session lifecycle, user actively chatting in a session, or session token accounting is central.
+- Use `telemetry_usage` for token counts, usage accounting, usage display backed by counters, metrics, usage statistics, and token freshness/staleness contracts.
+- `totalTokensFresh`, stale token counts, fresh token counts, session token accounting, or token-count freshness usually include both `sessions` and `telemetry_usage`.
+- ārespect totalTokensFresh flag to avoid showing stale token countsā => `sessions`, `telemetry_usage`, not `ui_tui`.
+
+Execution, browser, and cron:
+- Exec PATH fallback or command execution PATH resolution => `exec_tools`.
+- Browser diagnostics, browser automation diagnostics, or layered browser diagnostic behavior => `browser_automation`, not `ui_tui`.
+- Cron force-runs, cron scheduling, cron lifecycle, cron-triggered workflows, or cron execution => `cron_automation`.
+- Do not add `reliability` for deadlock wording when the title is already a multi-facet component fix and the maintainer bucket is the named component.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, missing/unclear error, or robustness failure.
+- Suppress `reliability` when the failure wording is only a symptom of a more specific maintainer-owned area and the titleās real subject is that areaās contract, initialization, rendering, or execution behavior.
+- If missing provider metadata causes a silent failure or poor error behavior, add `reliability`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, bundle-mcp behavior, or MCP-specific tool execution.
+- Use `tool_calling` when tool call arguments, parameter coercion, schemas, serialization, invocation contracts, or actual tool calls are central.
+- If a title says MCP tool calls and parameter coercion/arguments are the subject, include both `mcp_tooling` and `tool_calling`.
+
+ACP, ACPX, approvals, and runtime:
+- Use `acp` for ACP protocol/session behavior.
+- Use `acpx` only when ACPX, ACPX adapters/extensions, or per-binding/per-agent ACPX behavior is explicitly central.
+- Do not use `acpx` just because ACP is present.
+- Use `approvals` for permissionMode, approval modes, permission prompts, permission policy, or approval-gated execution.
+- ACP session permissionMode changes with per-binding or per-agent scope => `acp`, `approvals`, `acpx`.
+- Use `agent_runtime` for node-backed runtime, runtime ownership, runtime lifecycle, embedded runtime behavior, runtime setup, agent initialization, or spawning.
+- āACP: add gateway-owned node-backed runtimeā => `acp`, `gateway`, `agent_runtime`, not `acpx`.
+
+Coding agents and sandboxing:
+- Use `coding_agents` for subagents, spawned agents, coding-agent orchestration, sessions_spawn, agent takeover, or per-agent behavior.
+- Use `sandboxing` when sandbox.mode, sandbox policy, filesystem/network sandbox constraints, or sandbox execution mode are central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, provider, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, diagnostics, display, showing, delivery, announce, silent, breaks, UI, API, Responses API, LM Studio, or contract. Keep the topic only if it is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 1.2357142857142858. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6629365079365079 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6629365079365079
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.25, 4: 0.0, 5: 1.0, 6: 1.0, 7: 0.5, 8: 1.0, 9: 0.5, 10: 0.2, 11: 0.2, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 0.5, 20: 1.0, 21: 1.0, 22: 0.5, 23: 0.25, 24: 0.5, 25: 1.0, 26: 1.0, 27: 0.16666666666666666, 28: 0.25, 29: 0.2857142857142857}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6629365079365079}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.16666666666666666, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 15: Valset pareto front aggregate score: 0.9722222222222222
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11, 12}, 1: {12}, 2: {1, 2, 4, 10, 11, 12}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11, 12}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11, 12}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10}, 8: {0, 1, 2, 4, 10, 12}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10}, 12: {0, 1, 2, 3, 6, 8, 11, 12}, 13: {0, 1, 5, 7, 9, 10, 11, 12}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11, 12}, 17: {0, 1, 12}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11}, 19: {7}, 20: {11, 12}, 21: {1, 3, 4, 7, 8, 11, 12}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11, 12}, 26: {0, 1, 2, 3, 5, 6, 8, 10, 12}, 27: {12}, 28: {3, 6}, 29: {9, 4}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 15: Best valset aggregate score so far: 0.6971520146520146
+Iteration 15: Best program as per aggregate score on valset: 1
+Iteration 15: Best score on valset: 0.6971520146520146
+Iteration 15: Linear pareto front program index: 1
+Iteration 15: New program candidate index: 12
+Iteration 16: Selected program 2 score: 0.6569444444444444
+Iteration 16: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured output tool such as `final_json` is available, you MUST call it with the final object. If no structured output tool is available, output raw JSON only.
+
+Return exactly:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, hooks, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, incidental implementation details, broad impact, and label-like words unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or call the structured JSON output tool if one exists.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Important correction:
+For `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as:
+`reliability`, `auth_identity`, and `sessions`.
+
+Why:
+- `self-heal lane wedges` is a reliability/liveness/robustness issue.
+- Do NOT classify `lane wedges` as `queueing`; ālaneā is an implementation detail unless queueing, queued execution, queue lifecycle, or scheduling behavior is explicitly the user-visible subject.
+- `OAuth` is an auth and identity concern, so use `auth_identity`.
+- `openai-codex OAuth` does NOT imply `codex`; here ācodexā is part of the OAuth/provider naming, not the central maintainer-owned subject.
+- `embedded path` makes the OAuth restoration session/path-specific, so include `sessions`.
+
+Auth and identity:
+- Use `auth_identity` for OAuth, login identity, account identity, authentication flows, token identity, credential identity, or restoring auth behavior.
+- OAuth fixes should usually include `auth_identity`.
+- Add `sessions` when the auth problem is tied to a session path, embedded path, outbound session identity, session lifecycle, session routing, or session-specific behavior.
+- Do not add `codex` merely because an OAuth provider or path name contains ācodexā.
+
+Reliability:
+- Use `reliability` when the central bug is a general deadlock, hang, crash, wedge, stuck state, race, timeout, liveness issue, self-healing behavior, or robustness failure.
+- Words like `self-heal`, `wedge`, `wedged`, `stuck`, and `recover` are strong reliability signals when the title is about restoring forward progress.
+- Suppress `reliability` only when the failure mode clearly belongs to a more specific named feature owner, such as `cron force-run deadlock` being `cron_automation`.
+
+Queueing:
+- Use `queueing` only when queue, queueing, queued execution, queue lifecycle, queue scheduling, or steering in queues is central.
+- Do not add `queueing` for incidental words like lane, main lane, worker, subagent, execution path, or internal routing path.
+- A lane wedge is not automatically queueing.
+
+Codex:
+- Use `codex` when Codex itself is the central user-visible subject, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` just because the title contains ācodexā inside a provider name, OAuth label, bracketed metadata, or path name.
+- If the real subject is OAuth, classify as `auth_identity`, not `codex`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like lane, queue, codex, OAuth, embedded, path, wedge, session, usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, deadlock, security, diagnostics, display, showing, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 16: New subsample score 4.0 is better than old score 2.25. Continue to full eval and add to candidate pool.
+Iteration 16: Valset score for new program: 0.49142857142857144 (coverage 30 / 30)
+Iteration 16: Val aggregate for new program: 0.49142857142857144
+Iteration 16: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.25, 5: 0.5, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.2, 11: 1.0, 12: 1.0, 13: 0.0, 14: 1.0, 15: 0.5, 16: 0.0, 17: 0.25, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.25, 25: 0.5, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 16: Objective aggregate scores for new program: {'weighted_score': 0.4914285714285713}
+Iteration 16: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 16: Objective pareto front scores: {'weighted_score': 0.6971520146520145}
+Iteration 16: Valset pareto front aggregate score: 1.0
+Iteration 16: Updated valset pareto front programs: {0: {0, 1, 2, 4, 11, 12}, 1: {12}, 2: {1, 2, 4, 10, 11, 12}, 3: {0, 1, 2, 4, 5, 7, 9, 10}, 4: {8, 2, 4}, 5: {0, 2, 4, 5, 7, 11, 12}, 6: {0, 1, 2, 3, 4, 6, 7, 10, 11, 12, 13}, 7: {0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 13}, 8: {0, 1, 2, 4, 10, 12}, 9: {0, 1, 2, 4, 5, 7, 9, 11}, 10: {0, 2}, 11: {0, 1, 4, 9, 10, 13}, 12: {0, 1, 2, 3, 6, 8, 11, 12, 13}, 13: {0, 1, 5, 7, 9, 10, 11, 12}, 14: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}, 15: {0, 2, 4, 5, 7}, 16: {0, 1, 2, 4, 5, 7, 9, 11, 12}, 17: {0, 1, 12}, 18: {0, 1, 2, 3, 5, 6, 8, 9, 11, 13}, 19: {7}, 20: {11, 12}, 21: {1, 3, 4, 7, 8, 11, 12}, 22: {8}, 23: {4}, 24: {10, 6}, 25: {1, 10, 11, 12}, 26: {0, 1, 2, 3, 5, 6, 8, 10, 12, 13}, 27: {13}, 28: {3, 6}, 29: {9, 4}}
+Iteration 16: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 16: Best valset aggregate score so far: 0.6971520146520146
+Iteration 16: Best program as per aggregate score on valset: 1
+Iteration 16: Best score on valset: 0.6971520146520146
+Iteration 16: Linear pareto front program index: 1
+Iteration 16: New program candidate index: 13
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log_stderr.txt b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/score_report.html b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/score_report.html
new file mode 100644
index 0000000000000000000000000000000000000000..8578317b62b3172b077a9411ab1ef4e12c789e70
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/score_report.html
@@ -0,0 +1,261 @@
+
+
+
+
+
+gepa-12b-row30-prop16-from-proper-20260613T172903Z GEPA score report
+
+
+
+
+
Run directory: /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z
+
Base score
0.6396
+
Proposal attempts
15
+
Accepted full evals
13
+
Rejected proposals
2
+
Candidates
14
+
Best candidate
1
+
Metric calls
544
+
Validation Score Over Iterations
selected candidatebest so far
+
Proposal Subsample Delta
+
Final Candidate Scores
+
Proposal Events
Iteration
Old subsample
New subsample
Delta
Decision
1
3.2000
1.6500
-1.5500
rejected
+
2
3.1429
2.2500
-0.8929
rejected
+
3
1.0714
4.0000
2.9286
accepted
+
4
1.3429
3.2500
1.9071
accepted
+
5
2.6190
4.0000
1.3810
accepted
+
6
2.3910
4.0000
1.6090
accepted
+
7
2.0000
2.5357
0.5357
accepted
+
8
1.0714
4.0000
2.9286
accepted
+
9
1.4500
3.5000
2.0500
accepted
+
10
1.7857
4.0000
2.2143
accepted
+
11
3.2500
3.5000
0.2500
accepted
+
12
2.2500
4.0000
1.7500
accepted
+
14
1.7000
3.2500
1.5500
accepted
+
15
1.2357
4.0000
2.7643
accepted
+
16
2.2500
4.0000
1.7500
accepted
+
+
+
diff --git a/gepa-12b-row30-prop16-from-proper-20260613T172903Z/summary.json b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..2afdb3cfb2a61e1efe0ebbf518c345d373f78adb
--- /dev/null
+++ b/gepa-12b-row30-prop16-from-proper-20260613T172903Z/summary.json
@@ -0,0 +1,30 @@
+{
+ "best_idx": 1,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/best.routing_policy.md",
+ "best_score": 0.6971520146520146,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 16,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 11180,
+ "seed_routing_policy_sha256": "288c716272a98543b9c5d7c191ac9389173adddefd17a7fba409a2e9d6b9ff65"
+ },
+ "created_at": "2026-06-13T21:47:03.393070+00:00",
+ "num_candidates": 14,
+ "num_full_val_evals": 14,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-from-proper-20260613T172903Z/gepa-result.json",
+ "total_metric_calls": 544
+}
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.prompt.md b/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..b353152b50862d01952065bdd798c29d8a2ca41b
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.prompt.md
@@ -0,0 +1,203 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.routing_policy.md b/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..b339b568fd370e5d5b3132ed2680e66c8013a5da
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.routing_policy.md
@@ -0,0 +1,148 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidate_tree.html b/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..3c4afb8b536f03facccde6c0cd7fcd84f849e170
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidate_tree.html
@@ -0,0 +1,224 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidates.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..334bc27694917d06399cd356b6fb442724b0cd70
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/candidates.json
@@ -0,0 +1,59 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids include:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nImportant routing rules learned from prior errors:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.\n\n- Do not add `sessions` merely because a title contains `session`, `sessions_spawn`, channel IDs, binding IDs, or session-like implementation names. Add it only when session identity, lifecycle, routing, state, persistence, or session-specific user behavior is central.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area. In particular, `exec v2 contract` implies sandboxing and approvals.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` merely because the title says \u201cACP sessions\u201d or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.\n\nReliability, queueing, and lanes:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.\n- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.\n- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.\n- `self-heal lane wedges` is `reliability`, not `queueing`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.\n- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.\n\nCodex:\n- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add non-allowed or broad demo/showcase labels.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.\n- Do not substitute broad infrastructure topics for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nTask:\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nImportant routing rules:\n\n- Treat labels and repository metadata as noise unless the title/body makes them central.\n Do not infer `config` or `docs` from labels, issue templates, checklists, examples, or catalog/documentation mentions unless the user-visible request is actually about configuration behavior or documentation.\n\n- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, or integrating a provider/catalog entry.\n Example: \u201cAdd MiMo-V2.5 to Xiaomi catalog\u201d is `local_model_providers`.\n\n- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocols, returned blocks, or streaming lifecycle route to `model_serving`.\n Example: \u201cautomatic multimodal routing when DeepSeek V4-Pro is primary model\u201d is `model_serving`.\n A title combining provider catalog work and serving/routing behavior should include both `local_model_providers` and `model_serving`.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- `ACP` issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, or ACP integration behavior.\n\n- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.\n Example: `ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process`\n should be `acp` and `sessions`.\n\n- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.\n Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n If Claude Code is only the process affected by an ACP session-key bug, do not add `coding_agents`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.\n In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids are exactly:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nDo not output any topic outside this list.\n\nClassification method:\n- Read the title first.\n- Use the body only if the title is ambiguous.\n- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.\n- Choose one primary topic first, then add only explicit central secondary topics.\n- Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n- Prefer the domain meaning of the title over generic keyword matching.\n\nImportant routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n This applies even if the title starts with `test(exec):`.\n The PR type `test` does not make it a tests-only classification.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.\n\n- Do not classify something as `sandboxing` merely because the title contains `sandbox`.\n If the sandbox is just the environment where another feature runs, classify the feature instead.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.\n The central request is plugin inheritance for ACPX Codex, not sandbox policy.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should include `sessions` because outbound session identity is central.\n\n- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should be `gateway`, `sessions`, not `notifications`.\n\n- Do not add `notifications` merely because the title mentions message sending, outbound messages, announcement delivery, or notification-like implementation names.\n Add `notifications` only when user-facing alerts, notification settings, notification channels, or notification delivery semantics are the central subject.\n\n- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.\n Add `chat_integrations` only when Slack/Discord/ChatGPT/LinkedIn/chat adapter integration behavior is central.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should be `cron_automation`, `sessions`, `reliability`, not `chat_integrations`.\n\n- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should include `reliability` because the visible problem is an error during delivery.\n\n- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is central.\n\n- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, GitHub Actions supply-chain risk, or CI security posture.\n Example: `ci: harden GitHub Actions supply-chain boundaries`\n should be `security`, not deployment or packaging.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.\n\n- Use `skills_plugins` when user-installed plugins, skills, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, or contract.\nKeep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nInput format:\n- `target`: GitHub issue or PR URL.\n- `title`: the issue/PR title. This is the primary signal.\n- Optional body/metadata may be present. Use the body only when the title is ambiguous.\n- Ignore labels, paths, templates, checklists, examples, tests, implementation filenames, and repo metadata unless the user-visible request itself is about those things.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nCore method:\n1. Read the title first.\n2. Identify the user-visible subject, not the implementation area.\n3. Choose one primary topic.\n4. Add only explicit central secondary topics.\n5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.\n6. Run a final suppression check: remove any topic added only because of incidental words, labels, PR type, paths, tests, examples, function names, or implementation details.\n\nImportant routing rules:\n\n- Treat labels and repository metadata as noise.\n Do not infer `gateway`, `security`, `config`, or `docs` from labels, templates, checklists, examples, paths, or catalog/documentation mentions unless the title/body makes that topic the actual user-visible request.\n\n- Local Ollama, local embeddings, local model runtime behavior, LM Studio, or other locally hosted model integrations usually include `local_models`.\n If the problem concerns using a locally/self-hosted inference backend, endpoint, proxy, or runtime to serve inference, include `self_hosted_inference`.\n\n- A title like:\n `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`\n should be classified as `local_models` and `self_hosted_inference`.\n Do not add `gateway` merely because the repo or label mentions gateway.\n Do not add `security` merely because SSRF defenses are named; here the user-visible issue is local/self-hosted model inference failing with proxy/NO_PROXY behavior.\n\n- Documentation PRs can still have a central product topic.\n If a title is `docs(): ...`, classify as both `docs` and that scoped topic when the documentation is about that behavior.\n Example:\n `docs(queue): clarify steer behavior with partial streaming and tool boundaries`\n should be `queueing` and `docs`.\n\n- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, integrating, or maintaining a provider/catalog entry.\n This includes provider-specific metadata such as known context windows, model limits, model catalog entries, provider-backed model listings, or provider-specific open-weight model metadata.\n\n- Open-weight model metadata routes to `open_weight_models`.\n If the title combines open-weight model metadata with a provider/catalog entry, include both `open_weight_models` and `local_model_providers`.\n Example:\n `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`\n should be `open_weight_models` and `local_model_providers`.\n\n- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocol behavior, returned blocks, partial/streaming lifecycle, or serving compatibility route to `model_serving`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example:\n `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- LM Studio is a strong signal for `local_models`.\n\n- ACP issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, ACP integration behavior, or explicit ACP agent runs.\n\n- ACPX issues route to `acpx` when the user-visible subject is explicit agent-run orchestration, embedded backend visibility, stale final JSON state, final structured-output state for explicit runs, or ACPX-style run coordination.\n If a title combines explicit ACP agent runs with ACP behavior, include both `acpx` and `acp`.\n Example:\n `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`\n should be `acpx` and `acp`.\n\n- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.\n Do not add `sessions` for incidental function names such as `sessions_yield` or `sessions_spawn`.\n\n- Example:\n `ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process`\n should be `acp` and `sessions`.\n\n- `sessions_spawn` is an internal name.\n Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or state.\n Example:\n `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, and `agent_runtime`.\n\n- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.\n Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.\n In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, template, checklist, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, security, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput:\nYou may receive a target URL, title, body, summary, generated output, gold topics, or feedback. Classify the GitHub item itself, not the previous assistant response. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nSpecific routing rules:\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n- Example: `[codex] fix webchat full-message reader for truncated history` is not `codex`.\n\nGateway, API surface, UI/TUI:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n- Example: `[codex] fix webchat full-message reader for truncated history` => `gateway`, `api_surface`, `ui_tui`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving. Do not classify it as `local_model_providers`, `open_weight_models`, or `model_serving` unless provider catalogs, open-weight model metadata, or serving endpoints are actually central.\n- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n\nLocal model providers, open-weight models, model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug. Use it when the failure mode itself is robustness/liveness/error-handling.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Do not add `sessions` merely because the title says `ACP sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.\n- `[Bug]: ACP configured binding uses parent channel ID for session key - all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`.\n\nQueueing:\n- Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.\n- Words like lane, main lane, worker, subagent, or internal execution paths do not imply `queueing`.\n- `self-heal lane wedges` => `reliability`, not `queueing`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as docs for queueing.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- MCP tool-call parameter coercion can be both `mcp_tooling` and `tool_calling`.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract enforcement may include `exec_tools`, `sandboxing`, and `approvals` when all are named central contract areas.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central, such as announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nOther topics:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers. Active-memory recall deadlocks are usually `memory` plus `reliability`.\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.\n- Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item, up to the smallest complete set.\n- Do not substitute a broad infrastructure topic for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Update the classifier rules with these corrections.\n\nThe task is to classify a GitHub issue or pull request into the smallest complete set of allowed topic ids. The input may include a target URL, title, body, summary, generated output, gold topics, or feedback. Always classify the GitHub item itself, not the previous assistant response or feedback text.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAdd `packaging_deployment` to the allowed topic ids.\n\nPackaging / deployment:\n- Use `packaging_deployment` when the central issue involves install/distribution packaging, desktop app setup, bundled runtime availability, PATH/runtime resolution, executable discovery, npm/npx/node availability, spawned process availability, deployment artifacts, or app startup environment problems.\n- If a browser automation feature fails because the desktop package cannot find or launch a required tool such as `npx`, include both the user-visible feature topic and `packaging_deployment`.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n- In that example, Chrome sessions, Google sign-in, and profile attach are browser automation concerns; `spawn npx ENOENT` is a packaging/deployment/runtime availability concern.\n\nSelf-hosted inference vs local model providers:\n- Use `self_hosted_inference` when the central subject is supporting self-hosted inference servers, OpenAI-compatible self-hosted endpoints, or server families such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference backends.\n- Use `memory` as well when the self-hosted inference support is specifically for memory, embeddings, vector search, or embedding providers.\n- Do not use `local_model_providers` merely because the title names Ollama, vLLM, llama.cpp, TGI, LocalAI, OpenAI-compatible, provider, or model. Treat those as label spam unless the actual subject is provider catalog metadata, named provider/model-family support, provider-backed model context windows, or missing provider catalog entries.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`, not `local_model_providers`.\n\nGeneral strategy:\n- Read the title first and identify the main user-visible bug, feature, contract change, or documentation change.\n- Pick one primary topic.\n- Add secondary topics only when they are explicitly central maintainer-owned concerns.\n- If a title lists multiple independent central concerns, include each one, but keep the set minimal.\n- Suppress topics that appear only as labels, implementation details, examples, fixture names, broad ownership areas, or incidental words."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids are exactly:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents, hooks\n\nClassification method:\n- Read the title first.\n- Use the body only if the title is ambiguous.\n- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.\n- Choose one primary topic first, then add only explicit central secondary topics.\n- Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n- Prefer the domain meaning of the title over generic keyword matching.\n- Final answer must contain only allowed topic ids.\n\nImportant routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n This applies even if the title starts with `test(exec):`.\n The PR type `test` does not make it a tests-only classification.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.\n\n- Do not classify something as `sandboxing` merely because the title contains `sandbox`.\n If the sandbox is just the environment where another feature runs, classify the feature instead.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should include `sessions` because outbound session identity is central.\n\n- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should be `gateway`, `sessions`, not `notifications`.\n\n- Use `notifications` when user-facing alerts, notification settings, notification channels, notification delivery semantics, announcement pushes, heartbeat pushes, target-channel pushes, or notification identity/metadata on pushed messages are central.\n\n- Do not add `notifications` merely because the title mentions message sending, outbound messages, or notification-like implementation names.\n Add it only when the delivered alert/push/announcement/notification is the user-visible subject.\n\n- Use `chat_integrations` when Slack, Discord, ChatGPT, LinkedIn, chat adapters, or target-channel chat delivery behavior is central.\n A title about Slack pushes or target-channel Slack delivery should include `chat_integrations`.\n\n- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.\n Add it only when the integration/channel behavior itself is central.\n\n- Treat cron-related words as incidental when cron is only the delivery path for an announcement or push.\n Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes`\n should be `notifications`, `chat_integrations`, not `cron_automation` or `reliability`.\n\n- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is the central product surface, not merely the path where another feature failed.\n\n- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.\n Do not add `reliability` only because the title says `Bug`, `fix`, `dropped`, or mentions a previously fixed path. If the title is mainly about a feature\u2019s metadata/identity/channel behavior, classify that feature instead.\n\n- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, SSRF, token verification, high/critical vulnerabilities, GitHub Actions supply-chain risk, or CI security posture.\n Example: `ci: harden GitHub Actions supply-chain boundaries`\n should be `security`.\n\n- Use `hooks` when hook behavior, hook priority, hook execution order, hook registration, or hook security is a central subject.\n Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification`\n should include `security`, `hooks`, and `skills_plugins`.\n\n- Use `skills_plugins` when user-installed plugins, skills, skill scanners, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.\n\n- Do not add `local_model_providers` from unrelated scanner/provider/security wording unless local model provider behavior is explicitly central.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.\n\n- Use `acp` when ACP is explicitly central.\n\n- Use `acpx` when ACPX is explicitly central, or when the title describes ACP session behavior involving per-binding/per-agent configuration that belongs to the ACPX integration surface.\n Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions`\n should be `acp`, `approvals`, `acpx`.\n\n- Use `approvals` when permission mode, approval mode, permission gating, or approval policy is central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, cron, bug, fix, or contract.\nKeep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, or feedback text.\n\nInput format:\nYou may receive a target URL, title, body, summary, generated output, gold topics, or feedback. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nImportant feedback-derived rules:\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`, not `sessions`.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- In streaming/model-server titles, usage chunks are usually protocol fields, not telemetry/accounting.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing. Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `api_surface`.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nGateway, API surface, UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n\nLocal model providers, open-weight models, model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, or OAuth. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nCore task:\nClassify the GitHub item itself, not any previous assistant response, feedback text, labels, or generated output. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary clearly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, environment name, release-note fragment, or incidental phrase.\n\nComposite titles:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify only the central user-visible items needed for maintainer routing. Do not add every topic hinted by every noun. Treat weak fragments in grab-bag titles as label spam unless they clearly describe a central behavior change.\n\nImportant learned example:\n`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`\n=> `codex`, `chat_integrations`\nDo not add `auth_identity` here. Although `OAuth` appears, it is not central enough for routing; treat it as label spam unless the item is specifically about OAuth/auth/login/token behavior. Also do not add `skills_plugins` merely because `plugins` appears in a Codex startup phrase.\n\nTopic rules:\n\nCodex:\nUse `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\nDo not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\nCodex startup and Docker Codex setup are `codex`.\n\nAuth and identity:\nUse `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is the central subject.\nDo not add `auth_identity` merely because `OAuth` appears in a composite title, environment phrase, or product/setup fragment.\nAdd `auth_identity` only when the item is specifically about auth behavior, restoration, account identity, credentials, login, or tokens.\n\nChat integrations:\nUse `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\nWhatsApp history is `chat_integrations`.\n\nSkills/plugins:\nUse `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.\nDo not add it merely because `plugins` appears as part of Codex startup wording.\n\nGateway/API/UI:\nUse `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\nUse `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\nUse `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\nWebchat full-message reader behavior is both `api_surface` and `ui_tui`; include `gateway` when gateway-backed or gateway-owned.\n\nReliability:\nUse `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\nDo not use `reliability` for every bug.\n\nPolicy/config/security/MCP:\nUse `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\nUse `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\nUse `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\nIn policy/conformance titles, `model` usually means a policy area, not model providers or serving.\n\nLocal models/providers/serving:\nUse `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\nUse `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\nUse `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.\nDo not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n\nACP/ACPX/sessions/approvals:\nUse `acp` when ACP is named centrally.\nUse `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\nUse `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\nUse `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\nDo not add `sessions` merely because the title says `ACP sessions`.\n\nDocumentation:\nUse `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\nDocumentation-only PRs usually return `docs` alone.\n\nTools:\nUse `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\nUse `mcp_tooling` for MCP-specific behavior.\nUse `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nOther topics:\nUse `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.\nUse `notifications` when notification behavior itself is central.\nUse `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers.\nUse `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.\nUse `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\nUse `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.\nUse `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, OAuth, or Docker. Keep a topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nInput format:\n- target: GitHub issue or PR URL\n- title: issue/PR title\n- body: optional; read only if the title is ambiguous\n- labels, paths, examples, tests, implementation filenames, and incidental keywords may appear, but should not drive classification unless they describe the user-visible subject\n\nOutput only JSON, or call `final_json` exactly once if available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nClassification method:\n1. Read the title first.\n2. Use the body only if the title is ambiguous.\n3. Choose one primary topic.\n4. Add only explicit central secondary topics.\n5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.\n6. Remove any topic added only because of an incidental keyword.\n\nDomain-specific routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`.\n The PR type `test` does not make it tests-only.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox-governed execution, or permission boundaries are `sandboxing`.\n\n- Do not classify `sandboxing` merely because a title says \u201csandbox\u201d when the sandbox is just the environment where another feature runs.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, and `skills_plugins`, not `sandboxing`.\n The central subject is ACPX/Codex inheriting user-installed plugins.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, and `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `sessions` when the issue is about session identity, lifecycle, takeover, routing, active chat ownership, persistent session state, or user-visible session conflicts.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should include `sessions`, because `EmbeddedAttemptSessionTakeoverError` and \u201cuser is actively chatting\u201d indicate an active session takeover conflict.\n It should also include `cron_automation` for cron announce delivery and `reliability` because an unexpected error is triggered.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- Named external/local model provider catalogs, provider-specific model listings, and provider adapter behavior are usually `local_model_providers`.\n Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error`\n should be `local_model_providers` and `reliability`.\n Do not classify this as `model_serving`; the central subject is a provider catalog gap and silent failure.\n\n- Use `reliability` when the title describes silent failure, crashes, unexpected errors, broken delivery, missing error reporting, hangs, or behavior that fails instead of producing a clear user-facing outcome.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.\n\n- Do not use `model_serving` merely because a title mentions a model name, provider name, provider catalog, or model availability. Use it only when serving protocol, endpoint compatibility, response structure, streaming, or OpenAI-compatible serving behavior is central.\n\n- Use `codex` when Codex itself, Codex-like behavior, Codex environment integration, or Codex-specific UX/runtime behavior is central.\n In `ACPX Codex sandbox should inherit user-installed plugins`, `codex` is central.\n\n- Use `skills_plugins` when the issue is about plugins, installed plugins, inherited plugins, skills, plugin availability, or plugin runtime integration.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "Refine classification with these additional routing rules:\n\n- Treat UI-looking wording as label spam unless the UI itself is the maintainer-owned subject. A title like \u201crespect totalTokensFresh flag to avoid showing stale token counts\u201d is about session state plus usage/token accounting, not `ui_tui`.\n - Use `sessions` when freshness, lifecycle, state, or per-session values determine whether data is valid.\n - Use `telemetry_usage` when token counts, usage counts, cost/usage display, or accounting/reporting values are central.\n - Do not add `ui_tui` just because the bug is \u201cshown\u201d somewhere.\n\n- For memory embeddings backed by self-hosted OpenAI-compatible servers, classify the self-hosted inference path explicitly.\n - \u201cmemory/embeddings\u201d is `memory`.\n - \u201copenai-compatible provider for self-hosted servers\u201d, including llama.cpp, Ollama, vLLM, TGI, or LocalAI, is `self_hosted_inference`.\n - Do not use `local_model_providers` merely because provider names or model-server names appear. Use `local_model_providers` only for provider catalogs, provider metadata, provider-specific model support, or provider catalog behavior.\n\n- For local Ollama or local embedding failures involving proxy/NO_PROXY behavior, prefer local/self-hosted routing over broad security or memory labels.\n - \u201cLocal Ollama embeddings fail when proxy is enabled\u201d routes to `local_models` and `self_hosted_inference`.\n - Do not add `memory` just because the word \u201cembeddings\u201d appears, unless the memory subsystem, memory provider, vector store, or active-memory recall is central.\n - Do not add `security` just because the title mentions SSRF defenses, proxy, or NO_PROXY. Use `security` only when the security policy, access restriction, credential boundary, vulnerability, or allowed/blocked network behavior is the central user-visible subject.\n\n- Apply a final \u201clabel spam\u201d suppression pass:\n - Remove `ui_tui` if the title only implies data is displayed.\n - Remove `local_model_providers` if the title is about using self-hosted inference servers rather than catalog/provider metadata.\n - Remove `memory` if embeddings are only the model capability involved and the memory system itself is not central.\n - Remove `security` if SSRF/proxy wording is implementation context for local model connectivity rather than the central policy/security concern.\n\nCanonical examples:\n- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`\n- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`\n- `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nGoal:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nComposite title rule:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n\nImportant examples:\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock`\n => `exec_tools`, `browser_automation`, `cron_automation`\n Do not add `ui_tui` just because browser diagnostics might be visible somewhere.\n Do not add generic `reliability` when the deadlock is specifically a cron force-run/lifecycle bug owned by `cron_automation`.\n\n- `fix: LM Studio thinking blocks invisible with Responses API`\n => `local_models`, `model_serving`\n LM Studio is a local model provider/runtime signal.\n Responses API behavior, response block handling, streaming, endpoint compatibility, or request/response protocol behavior is `model_serving`.\n Do not add `ui_tui` merely because something is \u201cinvisible\u201d unless webchat, TUI, UI views, or a UI reader is explicitly central.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nTopic rules:\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` merely because the title says \u201cACP sessions\u201d or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`\n\nReliability, queueing, and lanes:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.\n- Prefer a more specific owning topic when the title names a domain-specific lifecycle bug, such as `cron force-run deadlock` => `cron_automation`.\n- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.\n- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.\n- `self-heal lane wedges` => `reliability`, not `queueing`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.\n- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`, not `codex`.\n\nCodex:\n- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nOpen-weight, local providers, local models, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, response block handling, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local provider/runtime with Responses API or OpenAI-compatible protocol behavior, include both `local_models` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- A cron force-run deadlock is `cron_automation`; do not add generic `reliability` unless reliability/self-healing is separately named as a central concern.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Do not add `ui_tui` merely because diagnostics are layered, output is invisible, or a result is user-visible. UI/TUI must be explicitly the owned surface.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `ui_tui` for browser diagnostics unless the title explicitly names UI/TUI/webchat display behavior.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, invisible, diagnostics, or browser. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, tests, implementation filenames, examples, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nGeneral routing rules:\n- Treat bracketed prefixes such as `[codex]`, PR types such as `test(...)`, and labels as routing noise unless the actual user-visible subject is that topic.\n- Do not add `codex` merely because the title starts with `[codex]`.\n- Do not add local model topics merely because the title contains `model` in a policy, conformance, config, or security context.\n- Do not add provider/model topics from label spam or incidental words.\n\nSpecific learned mappings:\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it tests-only.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- `Policy: add model, network, and MCP conformance checks` should be classified as:\n `mcp_tooling`, `config`, `security`.\n In this title, `model` is not `local_models` or `local_model_providers`; it is part of policy/conformance coverage. `network` and conformance/policy checks are security/config concerns.\n\n- `[codex] fix webchat full-message reader for truncated history` should be classified as:\n `gateway`, `api_surface`, `ui_tui`.\n Do not include `codex`; the prefix is label noise. `webchat` points to gateway/UI behavior, and `full-message reader` / `truncated history` is centrally about the API surface used to retrieve complete message content.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, or bracketed prefix. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not labels, generated output, prior assistant responses, feedback text, or incidental words.\n\nProcess:\n1. Read the title first. Treat it as the primary signal.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, or incidental phrase.\n\nImportant domain rules:\n\nLocal models / self-hosted inference / providers / security:\n- Use `local_models` when local model behavior is central, including local embeddings and local model runtime behavior.\n- Use `self_hosted_inference` when the issue concerns a locally hosted inference stack such as Ollama-backed inference or embeddings.\n- For titles like `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify as `local_models` and `self_hosted_inference`.\n- Do not add `local_model_providers` merely because Ollama, Gemini, vLLM, llama.cpp, or another provider name appears. Use `local_model_providers` only when provider-specific catalog metadata, provider integration, known context windows, named provider support, or provider/model-family support is the central subject.\n- Do not add `security` merely because the title mentions SSRF, proxy, NO_PROXY, defenses, network, or policy language. Add `security` only when the central issue is actually security policy, access restriction, credential boundary, vulnerability, or allowed/blocked security behavior.\n\nReliability / auth / sessions:\n- Use `reliability` when the central bug is a wedge, deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, self-healing behavior, or robustness failure.\n- Use `auth_identity` when OAuth, login, sign-in, tokens, authentication, account identity, credential identity, or identity restoration is central.\n- Use `sessions` when session identity, lifecycle, routing, persistent process identity, embedded session path, session-specific state, or embedded path behavior is central.\n- If OAuth restoration is tied to an embedded path, include both `auth_identity` and `sessions`.\n- For titles like `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as `reliability`, `auth_identity`, and `sessions`.\n- Do not add `codex` merely because the title says `openai-codex` or starts with `[codex]`. Use `codex` only when Codex itself is the central product/runtime/setup being changed.\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nGateway / sessions / API surface / UI:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Use `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If gateway-backed or gateway-owned, also include `gateway`.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.\n\nDocs:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools / MCP / exec:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior, MCP conformance, MCP policy, MCP tool behavior, protocol checks, or MCP integrations.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, proxy, SSRF, NO_PROXY, Ollama, or provider. Keep the topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only, with no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, feedback text, or gold topic list.\n\nInput format:\nYou may receive a target URL, title, body, summary, generated output, gold topics, feedback, score, or error. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous. Ignore generated outputs, feedback, score, and error except when they explain how to improve the classification rule.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics when the title explicitly names three central facets.\n6. Use 0 topics only when no allowed topic is central.\n7. Never invent topic ids.\n8. Always emit the required JSON object. Never omit the final JSON.\n\nImportant topic rules:\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- If a title says \u201cper-binding\u201d, \u201cbinding behavior\u201d, or ACPX-related binding, that is usually `acpx`.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`.\n\nApprovals / permission modes:\n- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.\n- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.\n- Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `acpx`, `approvals`.\n\nSkills / plugins:\n- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.\n- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.\n\nAPI surface / UI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nLocal model providers / open-weight models / model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, or sandbox. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output:\nReturn exactly one JSON object:\n{\"topics_of_interest\":[...]}"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only, with no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not generated output, feedback, score, label text, or gold topic lists.\n\nInput handling:\n1. Read the title first. It is the primary signal.\n2. Use the first clear body or summary only if the title is ambiguous.\n3. Ignore generated outputs, score, error, and gold topics.\n4. Use feedback only as rule-learning context, not as the item to classify.\n5. Pick one primary topic.\n6. Add secondary topics only when they are explicit central maintainer-owned subjects.\n7. Use 3 topics only when the title explicitly names three central facets.\n8. Use 0 topics only when no allowed topic is central.\n9. Never invent topic ids.\n10. Always emit the required JSON object.\n\nGeneral suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, broad ownership area, or incidental phrase.\n\nACP / ACPX / Agent runtime:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not infer `acpx` from `ACP` alone.\n- Use `agent_runtime` when runtime execution, backend runtime, node-backed runtime, agent process/runtime behavior, or runtime ownership is central.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n- If a title says `per-binding`, `binding behavior`, or ACPX-related binding, that is usually `acpx`.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n\nApprovals / permission modes:\n- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.\n- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.\n\nNotifications / chat integrations / cron:\n- Use `notifications` when announces, alerts, heartbeat pushes, notification delivery, target channels, or notification routing are central.\n- Use `chat_integrations` when Slack, Discord, Teams, chat channels, chat pushes, or chat-specific identity/target-channel behavior is central.\n- Do not use `cron_automation` merely because a title mentions `cron`, `cron --announce`, scheduled heartbeat, or an announce path. Treat cron as label/path-like noise unless the core issue is scheduling, recurring jobs, cron execution, or automation timing itself.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nSkills / plugins:\n- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.\n- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are central.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.\n\nAPI surface / UI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nLocal model providers / open-weight models / model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Provider-specific open-weight model metadata usually needs both `local_model_providers` and `open_weight_models`.\n- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` only when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, sandbox, runtime, provider, or Slack. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output:\nReturn exactly one JSON object:\n{\"topics_of_interest\":[...]}"
+ },
+ {
+ "routing_policy": "Additional classification rules learned from prior errors:\n\nAllowed topic ids must include `hooks` when hook behavior is part of the maintainer taxonomy.\n\nToken counts and usage freshness:\n- Titles about stale token counts, token usage counts, total token freshness flags, or usage display correctness are `telemetry_usage` when the user-visible subject is usage/count accuracy.\n- If the token count freshness is tied to session state, session identity, or per-session accounting, also include `sessions`.\n- Do not classify these as `ui_tui` merely because stale counts are shown in a UI. UI display is incidental unless the title is centrally about UI/TUI behavior.\n- Example: `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.\n\nSecurity bundles:\n- For composite security titles, classify each explicitly named central vulnerable subsystem, but do not over-expand from generic security words.\n- `token verification` in a vulnerability-fix title usually stays under `security`; do not add `auth_identity` unless authentication, OAuth, login, account identity, credential identity, or identity propagation is itself the central user-visible subject.\n- `skill scanner` or plugin/skill scanning vulnerability fixes should include `skills_plugins`.\n- `hook priority`, hook ordering, hook execution, or hook lifecycle should include `hooks` when hooks are an allowed topic.\n- `SSRF`, HIGH/CRITICAL vulnerabilities, network exploit prevention, and vulnerability remediation are `security`.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`, not `auth_identity`.\n\nFinal suppression check additions:\n- Remove `ui_tui` if the UI is only where telemetry, tokens, or stale values appear.\n- Remove `auth_identity` if the only auth-like phrase is token verification inside a security vulnerability bundle.\n- Keep `hooks` when hook priority/order/behavior is explicitly named as one of the central fixes."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/command.sh b/gepa-12b-row30-prop20-continuation-20260614T021448Z/command.sh
new file mode 100644
index 0000000000000000000000000000000000000000..303875e3cb798c884b1761e9e48b020120abe927
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/command.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+set -euo pipefail
+cd "/home/bob/repos/localpager"
+PYTHONPATH=prompt-optimizer/src "prompt-optimizer/.venv/bin/python" -m prompt_optimizer.cli optimize \
+ --model gemma-12b-q4km-reason \
+ --max-tokens 1536 \
+ --concurrency 2 \
+ --seed-routing-policy "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop16-continuation-20260613T214937Z/best.routing_policy.md" \
+ --row-limit 30 \
+ --reflection-minibatch-size 4 \
+ --max-candidate-proposals 20 \
+ --max-metric-calls 720 \
+ --output-dir "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z"
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..66ad0d47ef828d0321af55f3310a8a752ec5d0b6
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'agent_runtime'), description='This PR implements an ACP protocol feature involving gateway-owned durable state for a node-backed agent runtime.', caveats=('The PR title explicitly mentions ACP, gateway, and a node-backed runtime (agent_runtime).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ddce23fa5368dac935826b66f40551ab99508244
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'security', 'mcp_tooling'), description='This PR expands the Policy plugin by adding conformance checks for model providers, network SSRF settings, and managed MCP servers.', caveats=('The PR adds conformance checks for three distinct systems: model providers, network security (SSRF), and MCP servers, each corresponding to specific maintainer-owned domains.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..92b0aad1854c27e0af1cffac76b1aa47595e1f5a
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'sandboxing', 'approvals'), description='This pull request implements follow-through for Exec V2 contracts, involving execution tools, sandbox boundaries (safeBins), and approval logic.', caveats=('The PR title explicitly mentions \"exec v2 contract\", triggering the multi-topic requirement for exec_tools, sandboxing, and approvals as per instructions.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..73dcfac96a3e8e36b33774097e75e57d5909fe0f
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'codex', 'skills_plugins'), description='The feature request addresses plugin inheritance for Superpowers within an ACPX Codex sandbox environment.', caveats=('The issue involves a specific interaction between ACPX sandbox isolation and the availability of user-installed plugins (Superpowers).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..cf9da1c63c47d213dcc5d5ff4c86f7d838c89cc1
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_12/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'self_hosted_inference'), description='Adds an openai-compatible provider for local and self-hosted embedding servers like Ollama and vLLM within the memory system.', caveats=('The PR focuses on memory embedding providers for self-hosted infrastructure.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5b3a5de78ebf9932472f1a9019b503bc93a7b818
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_13/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers',), description='The issue identifies a missing model entry within the google-vertex provider catalog which leads to silent failures during resolution.', caveats=(\"The report mentions both a missing entry in the provider catalog and a failure in the fallback logic; however, the primary cause is a missing item in the specific provider's metadata.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..cf26c75cfc56ca10113babab7c332d7ad6e46e95
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_14/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs',), description='The pull request is a documentation update that adds an external tutorial link and video embed to the showcase page.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..39d15cc933b346013ec2c92089237e96e84f7335
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_15/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('open_weight_models', 'local_model_providers'), description='The PR addresses incorrect context windows for various open-weight models specifically integrated via the Mantle provider, satisfying both metadata and provider-specific integration criteria.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e75623826e2aef219f553165083266bc0789c17e
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_16/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('chat_integrations', 'notifications'), description='The issue describes a bug in identity overlay during announcement and heartbeat pushes specifically to Slack.', caveats=('The title explicitly mentions \"announce\" and \"heartbeat\", which are primary signals for notifications, while the destination is Slack, falling under chat_integrations.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..96127bb0bdea1196b23e4d8273abee4720eaad47
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_17/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_models', 'model_serving'), description=\"This PR fixes the rendering of reasoning blocks from LM Studio by correctly parsing the 'response.reasoning_text.done' event in the OpenAI-compatible Responses API stream.\", caveats=('The issue specifically addresses LM Studio (a local model tool) and its handling of the OpenAI-compatible \"Responses API\" stream protocol.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..efd6df7a8f50c0800b4ec01781c55cd6a486bbe8
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_18/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('security',), description='The PR focuses on hardening supply-chain boundaries, implementing OIDC for trusted publishing, and restricting permissions/access to sensitive release assets.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..d1634dbc28d0286e2493c7cc0be771391f8c70ac
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_19/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('sandboxing', 'sessions'), description=\"The issue reports a bug where specific 'sandbox.mode' configurations cause failures in 'sessions_spawn' during sub-agent initialization.\", caveats=('The issue involves both sandbox configuration and sub-agent session initiation, necessitating both tags for proper routing.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2034cff2af2a57a6ea0c86466e8f3c0e9b27f32b
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'browser_automation', 'cron_automation'), description='The PR resolves a variety of fixes involving shell execution paths, multi-layered browser diagnostic reporting, and resolving deadlocks in cron job processing.', caveats=('The pull request explicitly addresses three distinct issues mentioned in the title and summarized in the body: exec path fallbacks, browser status diagnostics, and cron execution lanes.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9d277ae5fc962a777598ae4701d15b03e8e09dc0
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_20/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('ui_tui',), description='The PR corrects the display of token counts in both the TUI and Web UI (webchat) by properly checking a freshness flag.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..4fa0dc2bb38f212d77ec31beab08f614355ffed7
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_21/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'tool_calling'), description='The PR addresses an issue where LLMs provide stringified JSON for tool parameters, requiring coercion before being processed by MCP tools.', caveats=('The issue involves both MCP-specific logic and general tool-call parameter coercion.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..6dae1e03d88edb13217a657af2960a997ded083c
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_22/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260614T022432Z-342942.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260614T022432Z-342942.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260614T022432Z-342942\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..f13b4f601cccc08a8d01c7cef655968b426ee5f2
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_23/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'security', 'config'), description='The issue addresses the failure of local embeddings due to SSRF-related proxy routing, requiring a balance between memory functionality, security policies, and configuration updates.', caveats=('The issue involves local embedding failures (memory), SSRF protection logic (security), and proposed configuration schema changes (config).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..88d830b46d4cdc94bba5dcc4ece5ba8e0dff7b3b
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_24/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('browser_automation', 'exec_tools'), description='The issue reports failures in connecting to existing Chrome sessions, involving both browser integration issues and shell command execution errors.', caveats=('The issue identifies two separate failures for existing Chrome sessions: one related to browser automation/integration (EasyClaw) and one related to shell execution/tool invocation (npx ENOENT).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5d315e0805cf901dca2a8944941bfd5b551f5f46
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_25/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_model_providers', 'model_serving'), description='The request involves adding a model to a provider-specific catalog and implementing automated routing for multimodal inputs.', caveats=('The issue contains two distinct requirements: updating the provider catalog and implementing automatic multimodal routing.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9ab3c5bd5fcaea54d6f265ec1622220f740473ea
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_26/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'sessions'), description='The bug involves incorrect session key generation for ACP bindings, leading to shared persistent processes across different threads within the same channel.', caveats=('The issue specifically describes a failure in session identity logic within the ACP framework.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..cd1cfadc576e951022b7a322f735ebad37125de7
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_27/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('queueing', 'reliability', 'codex'), description='Fixes for stuck lanes in the queue system (queueing) and recovery of OpenAI-Codex authentication (codex).', caveats=('The PR addresses two distinct areas: queue/lane management for gateway reliability and specific OAuth fixes for the Codex provider.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..36a15a53515d80438e8cee93fd50da600199d655
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_28/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('reliability', 'notifications'), description='The issue describes a race condition and session takeover error during the execution of an announcement delivery via a cron job.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2eb0704ca906fae3c911a6e997613f74f8fb9d1d
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_29/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'sessions'), description='The issue addresses failures in explicit ACP agent runs, specifically regarding backend visibility and state persistence during sessions_yield.', caveats=('The issue mentions \"embedded acpx\" but follows the rule that only explicitly named \"acp\" or \"acpx\" should be selected; since it\\'s about an ACP backend, acp is the primary identifier.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..34995c19cfd9fc9759b3385066571139e8b5147a
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'chat_integrations'), description='The PR addresses issues with Codex startup plugins, adds history to the WhatsApp integration, and updates Docker/Codex authentication configuration.', caveats=('The request covers a multi-faceted fix involving both specialized infrastructure (Codex) and external integrations (WhatsApp).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..07a36fada5d8d90b8715f7171f74557947b721c8
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving', 'local_models'), description='A bug in processing OpenAI-compatible streaming chunks causes usage data from llama.cpp to be lost, affecting session status and context tracking.', caveats=('The issue involves both the specific implementation for local models (llama.cpp) and the broader handling of OpenAI-compatible streaming protocols.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..69f0f5dd48f0b53371597208cb35e2137d107262
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals', 'sessions'), description='The request implements granular permission modes for ACP sessions, involving both ACP protocol features and core approval policy logic.', caveats=(\"The issue mentions 'acpx' in the body but the primary context is ACP session logic and permission mode policies.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..01e20808d2077a1e8ea26d0373fa949c36163220
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'sessions'), description='The PR addresses outbound session identity and guarded gateway send denial during the message sending process.', caveats=('The input title matches a specific example in the documentation providing guidance for multiple topics.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ebc28b76699503d7875b174922f75cc767f69e01
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory', 'reliability'), description='The PR fixes a deadlock in the active-memory recall subagent, involving both memory functionality and system reliability.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e8a1308f23235cc43f8277cdfd24851825f19415
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('api_surface', 'ui_tui', 'gateway'), description=\"This PR implements a new gateway RPC and UI updates for a 'full-message reader' on webchat, involving both API/Gateway contract changes and UI interaction improvements.\", caveats=('The [codex] tag in the title is ignored as it does not appear to represent a specific Codex-only feature but rather a project context or label.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5bd2cbc19b6d2838efde55beff316a4cf66b9337
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description=\"This PR updates documentation specifically concerning the queueing system's steer behavior and interaction with streaming.\", caveats=('The PR is primarily a documentation update regarding the queue system logic (steer behavior).',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa-result.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..25e0910e35407bb167c83d4cb9724ae421559d08
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa-result.json
@@ -0,0 +1,1125 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 2,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- A title like `Docs: add freeCodeCamp OpenClaw full tutorial to showcase` is only `docs`.\n- Do not add non-allowed or broad demo/showcase labels such as `agent_demos`.\n- Documentation-only PRs should usually include `docs` plus the central documented area only when that area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- A title like `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named or the title clearly says the subject is ACPX-specific.\n- Do not add `acpx` merely because an ACP item mentions configured bindings, binding IDs, session keys, channels, or session workflow.\n- A title like `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` is `acp` and `sessions`, not `acpx`.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, or session-specific behavior is central.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and only `acpx` if ACPX is explicitly named or clearly the binding system in question.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- A title like `[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model` is `local_model_providers` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- A title about \u201ccron --announce and heartbeat target-channel Slack pushes\u201d is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself central.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nChat integrations:\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.\n- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.\n- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nCodex:\n- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.\n- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nQueueing:\n- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.\n- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.\n- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids include:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nImportant routing rules learned from prior errors:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.\n\n- Do not add `sessions` merely because a title contains `session`, `sessions_spawn`, channel IDs, binding IDs, or session-like implementation names. Add it only when session identity, lifecycle, routing, state, persistence, or session-specific user behavior is central.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area. In particular, `exec v2 contract` implies sandboxing and approvals.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` merely because the title says \u201cACP sessions\u201d or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.\n\nReliability, queueing, and lanes:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.\n- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.\n- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.\n- `self-heal lane wedges` is `reliability`, not `queueing`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.\n- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.\n\nCodex:\n- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add non-allowed or broad demo/showcase labels.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.\n\nOpen-weight, local provider catalogs, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nLocal models and self-hosted inference:\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Do not add `model_serving` merely because a title says \u201copenai-compatible\u201d, \u201cprovider\u201d, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.\n- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.\n- Do not substitute broad infrastructure topics for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nTask:\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nImportant routing rules:\n\n- Treat labels and repository metadata as noise unless the title/body makes them central.\n Do not infer `config` or `docs` from labels, issue templates, checklists, examples, or catalog/documentation mentions unless the user-visible request is actually about configuration behavior or documentation.\n\n- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, or integrating a provider/catalog entry.\n Example: \u201cAdd MiMo-V2.5 to Xiaomi catalog\u201d is `local_model_providers`.\n\n- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocols, returned blocks, or streaming lifecycle route to `model_serving`.\n Example: \u201cautomatic multimodal routing when DeepSeek V4-Pro is primary model\u201d is `model_serving`.\n A title combining provider catalog work and serving/routing behavior should include both `local_model_providers` and `model_serving`.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- `ACP` issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, or ACP integration behavior.\n\n- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.\n Example: `ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process`\n should be `acp` and `sessions`.\n\n- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.\n Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n If Claude Code is only the process affected by an ACP session-key bug, do not add `coding_agents`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.\n In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids are exactly:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nDo not output any topic outside this list.\n\nClassification method:\n- Read the title first.\n- Use the body only if the title is ambiguous.\n- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.\n- Choose one primary topic first, then add only explicit central secondary topics.\n- Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n- Prefer the domain meaning of the title over generic keyword matching.\n\nImportant routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n This applies even if the title starts with `test(exec):`.\n The PR type `test` does not make it a tests-only classification.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.\n\n- Do not classify something as `sandboxing` merely because the title contains `sandbox`.\n If the sandbox is just the environment where another feature runs, classify the feature instead.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.\n The central request is plugin inheritance for ACPX Codex, not sandbox policy.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should include `sessions` because outbound session identity is central.\n\n- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should be `gateway`, `sessions`, not `notifications`.\n\n- Do not add `notifications` merely because the title mentions message sending, outbound messages, announcement delivery, or notification-like implementation names.\n Add `notifications` only when user-facing alerts, notification settings, notification channels, or notification delivery semantics are the central subject.\n\n- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.\n Add `chat_integrations` only when Slack/Discord/ChatGPT/LinkedIn/chat adapter integration behavior is central.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should be `cron_automation`, `sessions`, `reliability`, not `chat_integrations`.\n\n- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should include `reliability` because the visible problem is an error during delivery.\n\n- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is central.\n\n- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, GitHub Actions supply-chain risk, or CI security posture.\n Example: `ci: harden GitHub Actions supply-chain boundaries`\n should be `security`, not deployment or packaging.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.\n\n- Use `skills_plugins` when user-installed plugins, skills, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, or contract.\nKeep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nInput format:\n- `target`: GitHub issue or PR URL.\n- `title`: the issue/PR title. This is the primary signal.\n- Optional body/metadata may be present. Use the body only when the title is ambiguous.\n- Ignore labels, paths, templates, checklists, examples, tests, implementation filenames, and repo metadata unless the user-visible request itself is about those things.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nCore method:\n1. Read the title first.\n2. Identify the user-visible subject, not the implementation area.\n3. Choose one primary topic.\n4. Add only explicit central secondary topics.\n5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.\n6. Run a final suppression check: remove any topic added only because of incidental words, labels, PR type, paths, tests, examples, function names, or implementation details.\n\nImportant routing rules:\n\n- Treat labels and repository metadata as noise.\n Do not infer `gateway`, `security`, `config`, or `docs` from labels, templates, checklists, examples, paths, or catalog/documentation mentions unless the title/body makes that topic the actual user-visible request.\n\n- Local Ollama, local embeddings, local model runtime behavior, LM Studio, or other locally hosted model integrations usually include `local_models`.\n If the problem concerns using a locally/self-hosted inference backend, endpoint, proxy, or runtime to serve inference, include `self_hosted_inference`.\n\n- A title like:\n `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`\n should be classified as `local_models` and `self_hosted_inference`.\n Do not add `gateway` merely because the repo or label mentions gateway.\n Do not add `security` merely because SSRF defenses are named; here the user-visible issue is local/self-hosted model inference failing with proxy/NO_PROXY behavior.\n\n- Documentation PRs can still have a central product topic.\n If a title is `docs(): ...`, classify as both `docs` and that scoped topic when the documentation is about that behavior.\n Example:\n `docs(queue): clarify steer behavior with partial streaming and tool boundaries`\n should be `queueing` and `docs`.\n\n- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, integrating, or maintaining a provider/catalog entry.\n This includes provider-specific metadata such as known context windows, model limits, model catalog entries, provider-backed model listings, or provider-specific open-weight model metadata.\n\n- Open-weight model metadata routes to `open_weight_models`.\n If the title combines open-weight model metadata with a provider/catalog entry, include both `open_weight_models` and `local_model_providers`.\n Example:\n `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`\n should be `open_weight_models` and `local_model_providers`.\n\n- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocol behavior, returned blocks, partial/streaming lifecycle, or serving compatibility route to `model_serving`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example:\n `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- LM Studio is a strong signal for `local_models`.\n\n- ACP issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, ACP integration behavior, or explicit ACP agent runs.\n\n- ACPX issues route to `acpx` when the user-visible subject is explicit agent-run orchestration, embedded backend visibility, stale final JSON state, final structured-output state for explicit runs, or ACPX-style run coordination.\n If a title combines explicit ACP agent runs with ACP behavior, include both `acpx` and `acp`.\n Example:\n `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`\n should be `acpx` and `acp`.\n\n- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.\n Do not add `sessions` for incidental function names such as `sessions_yield` or `sessions_spawn`.\n\n- Example:\n `ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process`\n should be `acp` and `sessions`.\n\n- `sessions_spawn` is an internal name.\n Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or state.\n Example:\n `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, and `agent_runtime`.\n\n- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.\n Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.\n In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, template, checklist, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, security, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput:\nYou may receive a target URL, title, body, summary, generated output, gold topics, or feedback. Classify the GitHub item itself, not the previous assistant response. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nSpecific routing rules:\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n- Example: `[codex] fix webchat full-message reader for truncated history` is not `codex`.\n\nGateway, API surface, UI/TUI:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n- Example: `[codex] fix webchat full-message reader for truncated history` => `gateway`, `api_surface`, `ui_tui`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving. Do not classify it as `local_model_providers`, `open_weight_models`, or `model_serving` unless provider catalogs, open-weight model metadata, or serving endpoints are actually central.\n- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n\nLocal model providers, open-weight models, model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug. Use it when the failure mode itself is robustness/liveness/error-handling.\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Do not add `sessions` merely because the title says `ACP sessions`.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.\n- `[Bug]: ACP configured binding uses parent channel ID for session key - all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`.\n\nQueueing:\n- Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.\n- Words like lane, main lane, worker, subagent, or internal execution paths do not imply `queueing`.\n- `self-heal lane wedges` => `reliability`, not `queueing`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as docs for queueing.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- MCP tool-call parameter coercion can be both `mcp_tooling` and `tool_calling`.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract enforcement may include `exec_tools`, `sandboxing`, and `approvals` when all are named central contract areas.\n\nNotifications and chat:\n- Use `notifications` when notification behavior itself is central, such as announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nOther topics:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers. Active-memory recall deadlocks are usually `memory` plus `reliability`.\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.\n- Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nComposite titles:\n- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item, up to the smallest complete set.\n- Do not substitute a broad infrastructure topic for a listed user-visible subject.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Update the classifier rules with these corrections.\n\nThe task is to classify a GitHub issue or pull request into the smallest complete set of allowed topic ids. The input may include a target URL, title, body, summary, generated output, gold topics, or feedback. Always classify the GitHub item itself, not the previous assistant response or feedback text.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAdd `packaging_deployment` to the allowed topic ids.\n\nPackaging / deployment:\n- Use `packaging_deployment` when the central issue involves install/distribution packaging, desktop app setup, bundled runtime availability, PATH/runtime resolution, executable discovery, npm/npx/node availability, spawned process availability, deployment artifacts, or app startup environment problems.\n- If a browser automation feature fails because the desktop package cannot find or launch a required tool such as `npx`, include both the user-visible feature topic and `packaging_deployment`.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n- In that example, Chrome sessions, Google sign-in, and profile attach are browser automation concerns; `spawn npx ENOENT` is a packaging/deployment/runtime availability concern.\n\nSelf-hosted inference vs local model providers:\n- Use `self_hosted_inference` when the central subject is supporting self-hosted inference servers, OpenAI-compatible self-hosted endpoints, or server families such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference backends.\n- Use `memory` as well when the self-hosted inference support is specifically for memory, embeddings, vector search, or embedding providers.\n- Do not use `local_model_providers` merely because the title names Ollama, vLLM, llama.cpp, TGI, LocalAI, OpenAI-compatible, provider, or model. Treat those as label spam unless the actual subject is provider catalog metadata, named provider/model-family support, provider-backed model context windows, or missing provider catalog entries.\n- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`, not `local_model_providers`.\n\nGeneral strategy:\n- Read the title first and identify the main user-visible bug, feature, contract change, or documentation change.\n- Pick one primary topic.\n- Add secondary topics only when they are explicitly central maintainer-owned concerns.\n- If a title lists multiple independent central concerns, include each one, but keep the set minimal.\n- Suppress topics that appear only as labels, implementation details, examples, fixture names, broad ownership areas, or incidental words."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids are exactly:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents, hooks\n\nClassification method:\n- Read the title first.\n- Use the body only if the title is ambiguous.\n- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.\n- Choose one primary topic first, then add only explicit central secondary topics.\n- Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n- Prefer the domain meaning of the title over generic keyword matching.\n- Final answer must contain only allowed topic ids.\n\nImportant routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n This applies even if the title starts with `test(exec):`.\n The PR type `test` does not make it a tests-only classification.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.\n\n- Do not classify something as `sandboxing` merely because the title contains `sandbox`.\n If the sandbox is just the environment where another feature runs, classify the feature instead.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n\n- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should include `sessions` because outbound session identity is central.\n\n- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.\n Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`\n should be `gateway`, `sessions`, not `notifications`.\n\n- Use `notifications` when user-facing alerts, notification settings, notification channels, notification delivery semantics, announcement pushes, heartbeat pushes, target-channel pushes, or notification identity/metadata on pushed messages are central.\n\n- Do not add `notifications` merely because the title mentions message sending, outbound messages, or notification-like implementation names.\n Add it only when the delivered alert/push/announcement/notification is the user-visible subject.\n\n- Use `chat_integrations` when Slack, Discord, ChatGPT, LinkedIn, chat adapters, or target-channel chat delivery behavior is central.\n A title about Slack pushes or target-channel Slack delivery should include `chat_integrations`.\n\n- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.\n Add it only when the integration/channel behavior itself is central.\n\n- Treat cron-related words as incidental when cron is only the delivery path for an announcement or push.\n Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes`\n should be `notifications`, `chat_integrations`, not `cron_automation` or `reliability`.\n\n- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is the central product surface, not merely the path where another feature failed.\n\n- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.\n Do not add `reliability` only because the title says `Bug`, `fix`, `dropped`, or mentions a previously fixed path. If the title is mainly about a feature\u2019s metadata/identity/channel behavior, classify that feature instead.\n\n- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, SSRF, token verification, high/critical vulnerabilities, GitHub Actions supply-chain risk, or CI security posture.\n Example: `ci: harden GitHub Actions supply-chain boundaries`\n should be `security`.\n\n- Use `hooks` when hook behavior, hook priority, hook execution order, hook registration, or hook security is a central subject.\n Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification`\n should include `security`, `hooks`, and `skills_plugins`.\n\n- Use `skills_plugins` when user-installed plugins, skills, skill scanners, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.\n\n- Do not add `local_model_providers` from unrelated scanner/provider/security wording unless local model provider behavior is explicitly central.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.\n\n- Use `acp` when ACP is explicitly central.\n\n- Use `acpx` when ACPX is explicitly central, or when the title describes ACP session behavior involving per-binding/per-agent configuration that belongs to the ACPX integration surface.\n Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions`\n should be `acp`, `approvals`, `acpx`.\n\n- Use `approvals` when permission mode, approval mode, permission gating, or approval policy is central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, cron, bug, fix, or contract.\nKeep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, or feedback text.\n\nInput format:\nYou may receive a target URL, title, body, summary, generated output, gold topics, or feedback. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nImportant feedback-derived rules:\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`, not `sessions`.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- In streaming/model-server titles, usage chunks are usually protocol fields, not telemetry/accounting.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing. Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `api_surface`.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.\n- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nGateway, API surface, UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n\nLocal model providers, open-weight models, model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, or OAuth. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nCore task:\nClassify the GitHub item itself, not any previous assistant response, feedback text, labels, or generated output. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicitly central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary clearly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, environment name, release-note fragment, or incidental phrase.\n\nComposite titles:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify only the central user-visible items needed for maintainer routing. Do not add every topic hinted by every noun. Treat weak fragments in grab-bag titles as label spam unless they clearly describe a central behavior change.\n\nImportant learned example:\n`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`\n=> `codex`, `chat_integrations`\nDo not add `auth_identity` here. Although `OAuth` appears, it is not central enough for routing; treat it as label spam unless the item is specifically about OAuth/auth/login/token behavior. Also do not add `skills_plugins` merely because `plugins` appears in a Codex startup phrase.\n\nTopic rules:\n\nCodex:\nUse `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.\nDo not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\nCodex startup and Docker Codex setup are `codex`.\n\nAuth and identity:\nUse `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is the central subject.\nDo not add `auth_identity` merely because `OAuth` appears in a composite title, environment phrase, or product/setup fragment.\nAdd `auth_identity` only when the item is specifically about auth behavior, restoration, account identity, credentials, login, or tokens.\n\nChat integrations:\nUse `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\nWhatsApp history is `chat_integrations`.\n\nSkills/plugins:\nUse `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.\nDo not add it merely because `plugins` appears as part of Codex startup wording.\n\nGateway/API/UI:\nUse `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\nUse `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\nUse `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\nWebchat full-message reader behavior is both `api_surface` and `ui_tui`; include `gateway` when gateway-backed or gateway-owned.\n\nReliability:\nUse `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\nDo not use `reliability` for every bug.\n\nPolicy/config/security/MCP:\nUse `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\nUse `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\nUse `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\nIn policy/conformance titles, `model` usually means a policy area, not model providers or serving.\n\nLocal models/providers/serving:\nUse `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\nUse `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\nUse `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.\nDo not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n\nACP/ACPX/sessions/approvals:\nUse `acp` when ACP is named centrally.\nUse `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\nUse `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\nUse `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\nDo not add `sessions` merely because the title says `ACP sessions`.\n\nDocumentation:\nUse `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\nDocumentation-only PRs usually return `docs` alone.\n\nTools:\nUse `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\nUse `mcp_tooling` for MCP-specific behavior.\nUse `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nOther topics:\nUse `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.\nUse `notifications` when notification behavior itself is central.\nUse `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers.\nUse `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.\nUse `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\nUse `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.\nUse `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, OAuth, or Docker. Keep a topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nInput format:\n- target: GitHub issue or PR URL\n- title: issue/PR title\n- body: optional; read only if the title is ambiguous\n- labels, paths, examples, tests, implementation filenames, and incidental keywords may appear, but should not drive classification unless they describe the user-visible subject\n\nOutput only JSON, or call `final_json` exactly once if available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nClassification method:\n1. Read the title first.\n2. Use the body only if the title is ambiguous.\n3. Choose one primary topic.\n4. Add only explicit central secondary topics.\n5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.\n6. Remove any topic added only because of an incidental keyword.\n\nDomain-specific routing rules:\n\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`.\n The PR type `test` does not make it tests-only.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, sandbox-governed execution, or permission boundaries are `sandboxing`.\n\n- Do not classify `sandboxing` merely because a title says \u201csandbox\u201d when the sandbox is just the environment where another feature runs.\n Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`\n should be `acpx`, `codex`, and `skills_plugins`, not `sandboxing`.\n The central subject is ACPX/Codex inheriting user-installed plugins.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session behavior.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, and `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `sessions` when the issue is about session identity, lifecycle, takeover, routing, active chat ownership, persistent session state, or user-visible session conflicts.\n Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`\n should include `sessions`, because `EmbeddedAttemptSessionTakeoverError` and \u201cuser is actively chatting\u201d indicate an active session takeover conflict.\n It should also include `cron_automation` for cron announce delivery and `reliability` because an unexpected error is triggered.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- Named external/local model provider catalogs, provider-specific model listings, and provider adapter behavior are usually `local_model_providers`.\n Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error`\n should be `local_model_providers` and `reliability`.\n Do not classify this as `model_serving`; the central subject is a provider catalog gap and silent failure.\n\n- Use `reliability` when the title describes silent failure, crashes, unexpected errors, broken delivery, missing error reporting, hangs, or behavior that fails instead of producing a clear user-facing outcome.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.\n\n- Do not use `model_serving` merely because a title mentions a model name, provider name, provider catalog, or model availability. Use it only when serving protocol, endpoint compatibility, response structure, streaming, or OpenAI-compatible serving behavior is central.\n\n- Use `codex` when Codex itself, Codex-like behavior, Codex environment integration, or Codex-specific UX/runtime behavior is central.\n In `ACPX Codex sandbox should inherit user-installed plugins`, `codex` is central.\n\n- Use `skills_plugins` when the issue is about plugins, installed plugins, inherited plugins, skills, plugin availability, or plugin runtime integration.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "Refine classification with these additional routing rules:\n\n- Treat UI-looking wording as label spam unless the UI itself is the maintainer-owned subject. A title like \u201crespect totalTokensFresh flag to avoid showing stale token counts\u201d is about session state plus usage/token accounting, not `ui_tui`.\n - Use `sessions` when freshness, lifecycle, state, or per-session values determine whether data is valid.\n - Use `telemetry_usage` when token counts, usage counts, cost/usage display, or accounting/reporting values are central.\n - Do not add `ui_tui` just because the bug is \u201cshown\u201d somewhere.\n\n- For memory embeddings backed by self-hosted OpenAI-compatible servers, classify the self-hosted inference path explicitly.\n - \u201cmemory/embeddings\u201d is `memory`.\n - \u201copenai-compatible provider for self-hosted servers\u201d, including llama.cpp, Ollama, vLLM, TGI, or LocalAI, is `self_hosted_inference`.\n - Do not use `local_model_providers` merely because provider names or model-server names appear. Use `local_model_providers` only for provider catalogs, provider metadata, provider-specific model support, or provider catalog behavior.\n\n- For local Ollama or local embedding failures involving proxy/NO_PROXY behavior, prefer local/self-hosted routing over broad security or memory labels.\n - \u201cLocal Ollama embeddings fail when proxy is enabled\u201d routes to `local_models` and `self_hosted_inference`.\n - Do not add `memory` just because the word \u201cembeddings\u201d appears, unless the memory subsystem, memory provider, vector store, or active-memory recall is central.\n - Do not add `security` just because the title mentions SSRF defenses, proxy, or NO_PROXY. Use `security` only when the security policy, access restriction, credential boundary, vulnerability, or allowed/blocked network behavior is the central user-visible subject.\n\n- Apply a final \u201clabel spam\u201d suppression pass:\n - Remove `ui_tui` if the title only implies data is displayed.\n - Remove `local_model_providers` if the title is about using self-hosted inference servers rather than catalog/provider metadata.\n - Remove `memory` if embeddings are only the model capability involved and the memory system itself is not central.\n - Remove `security` if SSRF/proxy wording is implementation context for local model connectivity rather than the central policy/security concern.\n\nCanonical examples:\n- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`\n- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`\n- `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nIf a structured-output tool named `final_json` is available, call it exactly once with:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nIf no such tool is available, return only this final JSON object and no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity\n\nGoal:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.\n\nInput format:\n- You may receive a GitHub target URL, title, and sometimes a body or summary.\n- The title is the primary signal.\n- Use the first clear body summary only when the title is ambiguous.\n- Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.\n\nProcess:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids. Never output labels outside the allowed list.\n8. Output JSON only, or use the `final_json` tool if available.\n\nComposite title rule:\nIf a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.\n\nImportant examples:\n- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock`\n => `exec_tools`, `browser_automation`, `cron_automation`\n Do not add `ui_tui` just because browser diagnostics might be visible somewhere.\n Do not add generic `reliability` when the deadlock is specifically a cron force-run/lifecycle bug owned by `cron_automation`.\n\n- `fix: LM Studio thinking blocks invisible with Responses API`\n => `local_models`, `model_serving`\n LM Studio is a local model provider/runtime signal.\n Responses API behavior, response block handling, streaming, endpoint compatibility, or request/response protocol behavior is `model_serving`.\n Do not add `ui_tui` merely because something is \u201cinvisible\u201d unless webchat, TUI, UI views, or a UI reader is explicitly central.\n\nCore suppression rule:\nDo not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.\n\nTopic rules:\n\nACP, ACPX, sessions, approvals:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.\n- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.\n- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.\n- Do not add `sessions` merely because the title says \u201cACP sessions\u201d or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.\n- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`\n- `[Bug]: ACP configured binding uses parent channel ID for session key \u2014 all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`\n\nReliability, queueing, and lanes:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.\n- Prefer a more specific owning topic when the title names a domain-specific lifecycle bug, such as `cron force-run deadlock` => `cron_automation`.\n- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.\n- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.\n- `self-heal lane wedges` => `reliability`, not `queueing`.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.\n- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`, not `codex`.\n\nCodex:\n- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.\n- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs should usually include `docs` alone.\n- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.\n- Do not add `tool_calling` just because docs mention \u201ctool boundaries\u201d unless tool-call behavior itself is central.\n\nMCP and tool calling:\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.\n- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.\n\nOpen-weight, local providers, local models, and model serving:\n- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.\n- LM Studio is a strong signal for `local_models`.\n- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.\n- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, response block handling, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.\n- If a title combines a local provider/runtime with Responses API or OpenAI-compatible protocol behavior, include both `local_models` and `model_serving`.\n- Do not add `model_serving` merely because a title says \u201cmodel\u201d, \u201cprovider\u201d, \u201ccatalog\u201d, or names a model unless serving/routing/protocol behavior is central.\n\nNotifications and chat integrations:\n- Use `notifications` when notification behavior itself is central.\n- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.\n- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.\n- Slack target-channel pushes and WhatsApp history are `chat_integrations`.\n- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.\n\nCron:\n- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.\n- A cron force-run deadlock is `cron_automation`; do not add generic `reliability` unless reliability/self-healing is separately named as a central concern.\n- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.\n\nExec, sandboxing, approvals:\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.\n- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.\n\nMemory:\n- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.\n- Active-memory recall deadlocks should usually be `memory` plus `reliability`.\n\nGateway and sessions:\n- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.\n- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.\n- \u201cOutbound session identity\u201d is `sessions`.\n- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.\n\nAPI surface and UI/TUI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.\n- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.\n- Do not add `ui_tui` merely because diagnostics are layered, output is invisible, or a result is user-visible. UI/TUI must be explicitly the owned surface.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If that reader is gateway-backed or gateway-owned, also include `gateway`.\n\nSkills and plugins:\n- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.\n- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.\n\nGateway and runtime:\n- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.\n- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n\nTelemetry and usage:\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.\n- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.\n\nBrowser automation:\n- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.\n- `layered browser diagnostics` is `browser_automation`.\n- Do not add `ui_tui` for browser diagnostics unless the title explicitly names UI/TUI/webchat display behavior.\n- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.\n\nPolicy, config, security:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Do not map \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, invisible, diagnostics, or browser. Keep it only if that topic is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.\n\nOutput only JSON, or call `final_json` exactly once if that structured-output tool is available:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, coding_agents\n\nRead the title first. Use the body only if the title is ambiguous. Ignore labels, paths, tests, implementation filenames, examples, and incidental keywords unless they are the user-visible subject.\n\nChoose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.\n\nGeneral routing rules:\n- Treat bracketed prefixes such as `[codex]`, PR types such as `test(...)`, and labels as routing noise unless the actual user-visible subject is that topic.\n- Do not add `codex` merely because the title starts with `[codex]`.\n- Do not add local model topics merely because the title contains `model` in a policy, conformance, config, or security context.\n- Do not add provider/model topics from label spam or incidental words.\n\nSpecific learned mappings:\n- `exec v2 contract follow-through` routes to all contract-owned areas:\n `exec_tools`, `sandboxing`, and `approvals`.\n Do this even when the title starts with `test(exec):`; the PR type `test` does not make it tests-only.\n The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.\n\n- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.\n\n- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session state.\n Example: `Bug: sandbox.mode: \"non-main\" silently breaks sessions_spawn subagent initialization`\n should be `coding_agents`, `sandboxing`, `agent_runtime`.\n Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.\n\n- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.\n\n- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.\n\n- LM Studio is a strong signal for `local_models`.\n\n- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.\n Example: `fix: LM Studio thinking blocks invisible with Responses API`\n should be `model_serving` and `local_models`.\n\n- `Policy: add model, network, and MCP conformance checks` should be classified as:\n `mcp_tooling`, `config`, `security`.\n In this title, `model` is not `local_models` or `local_model_providers`; it is part of policy/conformance coverage. `network` and conformance/policy checks are security/config concerns.\n\n- `[codex] fix webchat full-message reader for truncated history` should be classified as:\n `gateway`, `api_surface`, `ui_tui`.\n Do not include `codex`; the prefix is label noise. `webchat` points to gateway/UI behavior, and `full-message reader` / `truncated history` is centrally about the API surface used to retrieve complete message content.\n\nFinal suppression check:\nRemove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, or bracketed prefix. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not labels, generated output, prior assistant responses, feedback text, or incidental words.\n\nProcess:\n1. Read the title first. Treat it as the primary signal.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned concerns.\n5. Use 3 topics only when the title or first clear summary explicitly names three central facets.\n6. Use 0 topics when no allowed topic is central.\n7. Never invent topic ids.\n8. Output JSON only.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, or incidental phrase.\n\nImportant domain rules:\n\nLocal models / self-hosted inference / providers / security:\n- Use `local_models` when local model behavior is central, including local embeddings and local model runtime behavior.\n- Use `self_hosted_inference` when the issue concerns a locally hosted inference stack such as Ollama-backed inference or embeddings.\n- For titles like `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify as `local_models` and `self_hosted_inference`.\n- Do not add `local_model_providers` merely because Ollama, Gemini, vLLM, llama.cpp, or another provider name appears. Use `local_model_providers` only when provider-specific catalog metadata, provider integration, known context windows, named provider support, or provider/model-family support is the central subject.\n- Do not add `security` merely because the title mentions SSRF, proxy, NO_PROXY, defenses, network, or policy language. Add `security` only when the central issue is actually security policy, access restriction, credential boundary, vulnerability, or allowed/blocked security behavior.\n\nReliability / auth / sessions:\n- Use `reliability` when the central bug is a wedge, deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, self-healing behavior, or robustness failure.\n- Use `auth_identity` when OAuth, login, sign-in, tokens, authentication, account identity, credential identity, or identity restoration is central.\n- Use `sessions` when session identity, lifecycle, routing, persistent process identity, embedded session path, session-specific state, or embedded path behavior is central.\n- If OAuth restoration is tied to an embedded path, include both `auth_identity` and `sessions`.\n- For titles like `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as `reliability`, `auth_identity`, and `sessions`.\n- Do not add `codex` merely because the title says `openai-codex` or starts with `[codex]`. Use `codex` only when Codex itself is the central product/runtime/setup being changed.\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n\nGateway / sessions / API surface / UI:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Use `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If gateway-backed or gateway-owned, also include `gateway`.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.\n\nDocs:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools / MCP / exec:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior, MCP conformance, MCP policy, MCP tool behavior, protocol checks, or MCP integrations.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, proxy, SSRF, NO_PROXY, Ollama, or provider. Keep the topic only if it is actually a central maintainer-owned subject."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only, with no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, feedback text, or gold topic list.\n\nInput format:\nYou may receive a target URL, title, body, summary, generated output, gold topics, feedback, score, or error. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous. Ignore generated outputs, feedback, score, and error except when they explain how to improve the classification rule.\n\nCore process:\n1. Read the title first.\n2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.\n3. Pick one primary topic.\n4. Add secondary topics only when they are explicit central maintainer-owned subjects.\n5. Use 3 topics when the title explicitly names three central facets.\n6. Use 0 topics only when no allowed topic is central.\n7. Never invent topic ids.\n8. Always emit the required JSON object. Never omit the final JSON.\n\nImportant topic rules:\n\nACP / ACPX / sessions:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- If a title says \u201cper-binding\u201d, \u201cbinding behavior\u201d, or ACPX-related binding, that is usually `acpx`.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.\n- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`.\n\nApprovals / permission modes:\n- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.\n- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.\n- Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `acpx`, `approvals`.\n\nSkills / plugins:\n- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.\n- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.\n- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.\n- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.\n\nAPI surface / UI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nLocal model providers / open-weight models / model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nSuppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, or sandbox. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output:\nReturn exactly one JSON object:\n{\"topics_of_interest\":[...]}"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nReturn JSON only, with no prose:\n{\"topics_of_interest\":[\"topic_id\"]}\n\nAllowed topic ids:\nqueueing, docs, notifications, sessions, gateway, reliability, memory,\nopen_weight_models, local_model_providers, codex, api_surface, ui_tui,\nchat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,\nmodel_serving, local_models, self_hosted_inference, telemetry_usage,\nexec_tools, sandboxing, browser_automation, cron_automation, config,\nsecurity, mcp_tooling, tool_calling, auth_identity, packaging_deployment\n\nTask:\nChoose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not generated output, feedback, score, label text, or gold topic lists.\n\nInput handling:\n1. Read the title first. It is the primary signal.\n2. Use the first clear body or summary only if the title is ambiguous.\n3. Ignore generated outputs, score, error, and gold topics.\n4. Use feedback only as rule-learning context, not as the item to classify.\n5. Pick one primary topic.\n6. Add secondary topics only when they are explicit central maintainer-owned subjects.\n7. Use 3 topics only when the title explicitly names three central facets.\n8. Use 0 topics only when no allowed topic is central.\n9. Never invent topic ids.\n10. Always emit the required JSON object.\n\nGeneral suppression rule:\nDo not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, broad ownership area, or incidental phrase.\n\nACP / ACPX / Agent runtime:\n- Use `acp` when ACP is named centrally.\n- Use `acpx` only when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.\n- Do not infer `acpx` from `ACP` alone.\n- Use `agent_runtime` when runtime execution, backend runtime, node-backed runtime, agent process/runtime behavior, or runtime ownership is central.\n- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.\n- If a title says `per-binding`, `binding behavior`, or ACPX-related binding, that is usually `acpx`.\n- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.\n\nApprovals / permission modes:\n- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.\n- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.\n\nNotifications / chat integrations / cron:\n- Use `notifications` when announces, alerts, heartbeat pushes, notification delivery, target channels, or notification routing are central.\n- Use `chat_integrations` when Slack, Discord, Teams, chat channels, chat pushes, or chat-specific identity/target-channel behavior is central.\n- Do not use `cron_automation` merely because a title mentions `cron`, `cron --announce`, scheduled heartbeat, or an announce path. Treat cron as label/path-like noise unless the core issue is scheduling, recurring jobs, cron execution, or automation timing itself.\n- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.\n\nSkills / plugins:\n- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.\n- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.\n\nCodex:\n- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.\n- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.\n\nPolicy, config, security, MCP:\n- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.\n- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.\n- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.\n- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.\n\nStreaming usage:\n- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.\n- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.\n- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are central.\n\nGateway / sessions / API surface:\n- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.\n- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.\n- Passing outbound session identity is a central `sessions` concern.\n- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.\n- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n\nBrowser automation / packaging:\n- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.\n- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.\n- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.\n- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.\n\nAPI surface / UI:\n- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.\n- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.\n- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.\n- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.\n\nLocal model providers / open-weight models / model serving:\n- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.\n- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.\n- Provider-specific open-weight model metadata usually needs both `local_model_providers` and `open_weight_models`.\n- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.\n- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.\n- Provider catalog missing-model issues are `local_model_providers`; add `reliability` only when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.\n\nReliability:\n- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.\n- Do not use `reliability` for every bug.\n\nAuth and identity:\n- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.\n- OAuth restoration is `auth_identity`.\n- `openai-codex OAuth` is not automatically `codex`.\n- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.\n\nDocumentation:\n- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.\n- Documentation-only PRs usually return `docs` alone.\n\nTools:\n- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.\n- Use `mcp_tooling` for MCP-specific behavior.\n- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.\n\nFinal suppression check:\nBefore outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, sandbox, runtime, provider, or Slack. Keep it only if that topic is actually a central maintainer-owned subject.\n\nFinal output:\nReturn exactly one JSON object:\n{\"topics_of_interest\":[...]}"
+ },
+ {
+ "routing_policy": "Additional classification rules learned from prior errors:\n\nAllowed topic ids must include `hooks` when hook behavior is part of the maintainer taxonomy.\n\nToken counts and usage freshness:\n- Titles about stale token counts, token usage counts, total token freshness flags, or usage display correctness are `telemetry_usage` when the user-visible subject is usage/count accuracy.\n- If the token count freshness is tied to session state, session identity, or per-session accounting, also include `sessions`.\n- Do not classify these as `ui_tui` merely because stale counts are shown in a UI. UI display is incidental unless the title is centrally about UI/TUI behavior.\n- Example: `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.\n\nSecurity bundles:\n- For composite security titles, classify each explicitly named central vulnerable subsystem, but do not over-expand from generic security words.\n- `token verification` in a vulnerability-fix title usually stays under `security`; do not add `auth_identity` unless authentication, OAuth, login, account identity, credential identity, or identity propagation is itself the central user-visible subject.\n- `skill scanner` or plugin/skill scanning vulnerability fixes should include `skills_plugins`.\n- `hook priority`, hook ordering, hook execution, or hook lifecycle should include `hooks` when hooks are an allowed topic.\n- `SSRF`, HIGH/CRITICAL vulnerabilities, network exploit prevention, and vulnerability remediation are `security`.\n- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`, not `auth_identity`.\n\nFinal suppression check additions:\n- Remove `ui_tui` if the UI is only where telemetry, tokens, or stale values appear.\n- Remove `auth_identity` if the only auth-like phrase is token verification inside a security vulnerability bundle.\n- Keep `hooks` when hook priority/order/behavior is explicitly named as one of the central fixes."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 38,
+ 76,
+ 114,
+ 160,
+ 198,
+ 236,
+ 274,
+ 312,
+ 350,
+ 388,
+ 426,
+ 472,
+ 510,
+ 548,
+ 586,
+ 624,
+ 662,
+ 700
+ ],
+ "num_full_val_evals": 19,
+ "objective_pareto_front": {
+ "weighted_score": 0.7403968253968253
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ],
+ [
+ 0
+ ],
+ [
+ 1
+ ],
+ [
+ 1
+ ],
+ [
+ 3
+ ],
+ [
+ 2
+ ],
+ [
+ 6
+ ],
+ [
+ 4
+ ],
+ [
+ 6
+ ],
+ [
+ 6
+ ],
+ [
+ 1
+ ],
+ [
+ 6
+ ],
+ [
+ 2
+ ],
+ [
+ 1
+ ],
+ [
+ 9
+ ],
+ [
+ 9
+ ],
+ [
+ 16
+ ],
+ [
+ 2
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 2
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 5,
+ 8,
+ 10,
+ 13,
+ 14,
+ 17
+ ],
+ "1": [
+ 16,
+ 17,
+ 6,
+ 10,
+ 13,
+ 14
+ ],
+ "2": [
+ 0,
+ 16,
+ 2,
+ 13
+ ],
+ "3": [
+ 0,
+ 8,
+ 2,
+ 10
+ ],
+ "4": [
+ 16,
+ 17,
+ 2,
+ 6,
+ 9,
+ 10,
+ 15
+ ],
+ "5": [
+ 16,
+ 17,
+ 2,
+ 5,
+ 6,
+ 8,
+ 9
+ ],
+ "6": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 5,
+ 6,
+ 8,
+ 9,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ],
+ "7": [
+ 0,
+ 2,
+ 4,
+ 6,
+ 9,
+ 11,
+ 15,
+ 17,
+ 18
+ ],
+ "8": [
+ 0,
+ 1,
+ 2,
+ 6,
+ 9,
+ 10,
+ 13,
+ 14,
+ 15,
+ 16
+ ],
+ "9": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 12,
+ 13,
+ 18
+ ],
+ "10": [
+ 0,
+ 1,
+ 3,
+ 4,
+ 5,
+ 8,
+ 11,
+ 13,
+ 14
+ ],
+ "11": [
+ 0,
+ 2,
+ 4,
+ 8,
+ 11,
+ 13,
+ 16,
+ 17
+ ],
+ "12": [
+ 0,
+ 1,
+ 2,
+ 18,
+ 4,
+ 7,
+ 12
+ ],
+ "13": [
+ 1,
+ 2,
+ 6,
+ 9,
+ 10,
+ 11,
+ 14,
+ 16,
+ 17,
+ 18
+ ],
+ "14": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17
+ ],
+ "15": [
+ 0,
+ 17,
+ 5,
+ 6,
+ 9,
+ 13,
+ 15
+ ],
+ "16": [
+ 0,
+ 17,
+ 2,
+ 3,
+ 8,
+ 14
+ ],
+ "17": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 8,
+ 11,
+ 13,
+ 14
+ ],
+ "18": [
+ 0,
+ 4,
+ 5,
+ 6,
+ 8,
+ 9,
+ 10,
+ 13,
+ 15,
+ 16,
+ 17
+ ],
+ "19": [
+ 1,
+ 3,
+ 14
+ ],
+ "20": [
+ 18,
+ 12
+ ],
+ "21": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 6,
+ 8,
+ 13,
+ 18
+ ],
+ "22": [
+ 8
+ ],
+ "23": [
+ 12,
+ 5,
+ 15
+ ],
+ "24": [
+ 1,
+ 6,
+ 7,
+ 9,
+ 15,
+ 16,
+ 17,
+ 18
+ ],
+ "25": [
+ 0,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 13
+ ],
+ "26": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 5,
+ 7,
+ 12,
+ 13,
+ 15,
+ 17
+ ],
+ "27": [
+ 2,
+ 13
+ ],
+ "28": [
+ 18,
+ 11,
+ 4
+ ],
+ "29": [
+ 16,
+ 9,
+ 5
+ ]
+ },
+ "run_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z",
+ "seed": 0,
+ "total_metric_calls": 730,
+ "val_aggregate_scores": [
+ 0.6877944862155388,
+ 0.585970695970696,
+ 0.7403968253968254,
+ 0.5383333333333333,
+ 0.5166544566544566,
+ 0.584047619047619,
+ 0.6160651629072682,
+ 0.4290659340659341,
+ 0.5842551892551893,
+ 0.5863186813186814,
+ 0.5487301587301587,
+ 0.4592735042735043,
+ 0.4111029711029711,
+ 0.6716666666666666,
+ 0.530018315018315,
+ 0.5569841269841269,
+ 0.6092857142857143,
+ 0.6176678876678877,
+ 0.4475383553644423
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.6877944862155387
+ },
+ {
+ "weighted_score": 0.585970695970696
+ },
+ {
+ "weighted_score": 0.7403968253968253
+ },
+ {
+ "weighted_score": 0.5383333333333333
+ },
+ {
+ "weighted_score": 0.5166544566544566
+ },
+ {
+ "weighted_score": 0.584047619047619
+ },
+ {
+ "weighted_score": 0.6160651629072681
+ },
+ {
+ "weighted_score": 0.429065934065934
+ },
+ {
+ "weighted_score": 0.5842551892551892
+ },
+ {
+ "weighted_score": 0.5863186813186813
+ },
+ {
+ "weighted_score": 0.5487301587301587
+ },
+ {
+ "weighted_score": 0.4592735042735043
+ },
+ {
+ "weighted_score": 0.41110297110297106
+ },
+ {
+ "weighted_score": 0.6716666666666666
+ },
+ {
+ "weighted_score": 0.530018315018315
+ },
+ {
+ "weighted_score": 0.5569841269841269
+ },
+ {
+ "weighted_score": 0.6092857142857142
+ },
+ {
+ "weighted_score": 0.6176678876678875
+ },
+ {
+ "weighted_score": 0.4475383553644424
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 0.2857142857142857,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.0,
+ "23": 0.10526315789473684,
+ "24": 0.25,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.14285714285714285,
+ "28": 0.2,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.15384615384615385,
+ "4": 0.2857142857142857,
+ "5": 0.0,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.25,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 0.2857142857142857,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.25,
+ "23": 0.15384615384615385,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.25,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 1.0,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.5,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 0.25,
+ "19": 0.3333333333333333,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.14285714285714285,
+ "24": 0.25,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 1.0,
+ "28": 0.0,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.14285714285714285,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.25,
+ "12": 0.25,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 0.2857142857142857,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.2,
+ "23": 0.25,
+ "24": 0.2857142857142857,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.0,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.25,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.15384615384615385,
+ "4": 0.16666666666666666,
+ "5": 0.5,
+ "6": 0.2857142857142857,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.25,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.15384615384615385,
+ "24": 0.25,
+ "25": 1.0,
+ "26": 0.2857142857142857,
+ "27": 0.1,
+ "28": 1.0,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 1.0,
+ "1": 0.14285714285714285,
+ "2": 0.0,
+ "3": 0.5,
+ "4": 0.2857142857142857,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.14285714285714285,
+ "12": 0.25,
+ "13": 0.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.0,
+ "20": 0.5,
+ "21": 0.5,
+ "22": 0.0,
+ "23": 1.0,
+ "24": 0.25,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 0.2,
+ "28": 0.5,
+ "29": 1.0
+ },
+ {
+ "0": 0.5,
+ "1": 1.0,
+ "2": 0.10526315789473684,
+ "3": 0.0,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25,
+ "12": 0.5,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.14285714285714285,
+ "20": 0.14285714285714285,
+ "21": 1.0,
+ "22": 0.5,
+ "23": 0.10526315789473684,
+ "24": 1.0,
+ "25": 1.0,
+ "26": 0.2857142857142857,
+ "27": 0.25,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 0.5,
+ "1": 0.5,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.15384615384615385,
+ "5": 0.5,
+ "6": 0.25,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.25,
+ "18": 0.2857142857142857,
+ "19": 0.2,
+ "20": 0.25,
+ "21": 0.5,
+ "22": 0.15384615384615385,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.2,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 0.16666666666666666,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 0.25,
+ "9": 0.5,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 1.0,
+ "23": 0.15384615384615385,
+ "24": 0.5,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.2,
+ "28": 0.2,
+ "29": 0.2857142857142857
+ },
+ {
+ "0": 0.5,
+ "1": 0.5,
+ "2": 0.14285714285714285,
+ "3": 0.15384615384615385,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 0.3333333333333333,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.3333333333333333,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 0.25,
+ "27": 0.5,
+ "28": 0.5,
+ "29": 1.0
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 1.0,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.5,
+ "7": 0.5,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 0.5,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 0.25,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 0.0,
+ "26": 0.5,
+ "27": 0.2,
+ "28": 0.5,
+ "29": 0.5
+ },
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.25,
+ "3": 0.15384615384615385,
+ "4": 0.16666666666666666,
+ "5": 0.5,
+ "6": 0.2857142857142857,
+ "7": 1.0,
+ "8": 0.2,
+ "9": 0.5,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 0.25,
+ "19": 0.25,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.15384615384615385,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.14285714285714285,
+ "28": 1.0,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 0.2857142857142857,
+ "1": 0.2,
+ "2": 0.25,
+ "3": 0.2857142857142857,
+ "4": 0.1111111111111111,
+ "5": 0.25,
+ "6": 0.2857142857142857,
+ "7": 0.25,
+ "8": 0.15384615384615385,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.25,
+ "18": 0.2857142857142857,
+ "19": 0.2,
+ "20": 1.0,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 1.0,
+ "24": 0.2857142857142857,
+ "25": 0.25,
+ "26": 1.0,
+ "27": 0.1,
+ "28": 0.5,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 0.2857142857142857,
+ "4": 0.2857142857142857,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 1.0,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 1.0,
+ "12": 0.2857142857142857,
+ "13": 0.25,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 1.0,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 1.0,
+ "22": 0.0,
+ "23": 0.14285714285714285,
+ "24": 0.5,
+ "25": 1.0,
+ "26": 1.0,
+ "27": 1.0,
+ "28": 0.5,
+ "29": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 0.2857142857142857,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 1.0,
+ "11": 0.25,
+ "12": 0.2857142857142857,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 1.0,
+ "17": 1.0,
+ "18": 0.2857142857142857,
+ "19": 1.0,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 0.15384615384615385,
+ "24": 0.25,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.0,
+ "28": 0.25,
+ "29": 0.15384615384615385
+ },
+ {
+ "0": 0.25,
+ "1": 0.2,
+ "2": 0.14285714285714285,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.25,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 0.25,
+ "12": 0.25,
+ "13": 0.5,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 0.25,
+ "17": 0.25,
+ "18": 1.0,
+ "19": 0.3333333333333333,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.25,
+ "23": 1.0,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 1.0,
+ "27": 0.5,
+ "28": 0.5,
+ "29": 0.5
+ },
+ {
+ "0": 0.25,
+ "1": 1.0,
+ "2": 1.0,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 0.25,
+ "8": 1.0,
+ "9": 0.5,
+ "10": 0.2,
+ "11": 1.0,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 0.5,
+ "16": 0.25,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.2,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.5,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.5,
+ "26": 0.2857142857142857,
+ "27": 0.5,
+ "28": 0.5,
+ "29": 1.0
+ },
+ {
+ "0": 1.0,
+ "1": 1.0,
+ "2": 0.0,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 1.0,
+ "7": 1.0,
+ "8": 0.5,
+ "9": 0.5,
+ "10": 0.3333333333333333,
+ "11": 1.0,
+ "12": 0.25,
+ "13": 1.0,
+ "14": 1.0,
+ "15": 1.0,
+ "16": 1.0,
+ "17": 0.5,
+ "18": 1.0,
+ "19": 0.5,
+ "20": 0.2,
+ "21": 0.5,
+ "22": 0.15384615384615385,
+ "23": 0.14285714285714285,
+ "24": 1.0,
+ "25": 0.25,
+ "26": 1.0,
+ "27": 0.0,
+ "28": 0.2,
+ "29": 0.25
+ },
+ {
+ "0": 0.15384615384615385,
+ "1": 0.25,
+ "2": 0.15384615384615385,
+ "3": 0.16666666666666666,
+ "4": 0.15384615384615385,
+ "5": 0.25,
+ "6": 0.2857142857142857,
+ "7": 1.0,
+ "8": 0.25,
+ "9": 1.0,
+ "10": 0.14285714285714285,
+ "11": 0.2857142857142857,
+ "12": 1.0,
+ "13": 1.0,
+ "14": 0.2857142857142857,
+ "15": 0.5,
+ "16": 0.15384615384615385,
+ "17": 0.25,
+ "18": 0.2857142857142857,
+ "19": 0.2,
+ "20": 1.0,
+ "21": 1.0,
+ "22": 0.2857142857142857,
+ "23": 0.08695652173913043,
+ "24": 1.0,
+ "25": 0.25,
+ "26": 0.2857142857142857,
+ "27": 0.5,
+ "28": 1.0,
+ "29": 0.25
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa_state.bin b/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..f5e529da253f1ac266613118e6ebb292c2f82775
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fd4946cd4dc8738eea7d5e59b9e6fb40fe82fa7ddefa1d88851075a66f5a7298
+size 151487
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.pid b/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.pid
new file mode 100644
index 0000000000000000000000000000000000000000..e1285e6699b7a493020d78e3a973c0ceb2fff159
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.pid
@@ -0,0 +1 @@
+338163
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.stderr.log b/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.stdout.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.stdout.json
new file mode 100644
index 0000000000000000000000000000000000000000..8f4bf688e3c3375a513c985f7694441bd2bae025
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/optimize.stdout.json
@@ -0,0 +1,2046 @@
+Iteration 0: Base program full valset score: 0.6877944862155388 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.6877944862155388
+Iteration 1: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+Important routing rules learned from prior errors:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+ Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.
+
+- Do not add `sessions` merely because a title contains `session`, `sessions_spawn`, channel IDs, binding IDs, or session-like implementation names. Add it only when session identity, lifecycle, routing, state, persistence, or session-specific user behavior is central.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area. In particular, `exec v2 contract` implies sandboxing and approvals.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 1: New subsample score 3.1538461538461537 is better than old score 2.2. Continue to full eval and add to candidate pool.
+Iteration 1: Valset score for new program: 0.585970695970696 (coverage 30 / 30)
+Iteration 1: Val aggregate for new program: 0.585970695970696
+Iteration 1: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.2857142857142857, 5: 0.0, 6: 1.0, 7: 0.25, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.25, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.15384615384615385, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.585970695970696}
+Iteration 1: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.6877944862155387}
+Iteration 1: Valset pareto front aggregate score: 0.7796520146520147
+Iteration 1: Updated valset pareto front programs: {0: {0, 1}, 1: {0, 1}, 2: {0}, 3: {0}, 4: {0, 1}, 5: {0}, 6: {0, 1}, 7: {0}, 8: {0, 1}, 9: {0, 1}, 10: {0, 1}, 11: {0}, 12: {0, 1}, 13: {1}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {0, 1}, 18: {0}, 19: {1}, 20: {0, 1}, 21: {0, 1}, 22: {1}, 23: {1}, 24: {1}, 25: {0}, 26: {0, 1}, 27: {1}, 28: {1}, 29: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 1: Best valset aggregate score so far: 0.6877944862155388
+Iteration 1: Best program as per aggregate score on valset: 0
+Iteration 1: Best score on valset: 0.6877944862155388
+Iteration 1: Linear pareto front program index: 0
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 0 score: 0.6877944862155388
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 2: New subsample score 4.0 is better than old score 2.375. Continue to full eval and add to candidate pool.
+Iteration 2: Found a better program on the valset with score 0.7403968253968254.
+Iteration 2: Valset score for new program: 0.7403968253968254 (coverage 30 / 30)
+Iteration 2: Val aggregate for new program: 0.7403968253968254
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.25, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.14285714285714285, 24: 0.25, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.0, 29: 0.2857142857142857}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.7403968253968253}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.2857142857142857}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 2: Valset pareto front aggregate score: 0.862985347985348
+Iteration 2: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2}, 7: {0, 2}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 1}, 11: {0, 2}, 12: {0, 1, 2}, 13: {1, 2}, 14: {0, 1, 2}, 15: {0}, 16: {0, 2}, 17: {0, 1, 2}, 18: {0}, 19: {1}, 20: {0, 1, 2}, 21: {0, 1, 2}, 22: {2}, 23: {1}, 24: {1}, 25: {0, 2}, 26: {0, 1, 2}, 27: {2}, 28: {1}, 29: {2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 2: Best valset aggregate score so far: 0.7403968253968254
+Iteration 2: Best program as per aggregate score on valset: 2
+Iteration 2: Best score on valset: 0.7403968253968254
+Iteration 2: Linear pareto front program index: 2
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 1 score: 0.585970695970696
+Iteration 3: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Task:
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+Important routing rules:
+
+- Treat labels and repository metadata as noise unless the title/body makes them central.
+ Do not infer `config` or `docs` from labels, issue templates, checklists, examples, or catalog/documentation mentions unless the user-visible request is actually about configuration behavior or documentation.
+
+- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, or integrating a provider/catalog entry.
+ Example: āAdd MiMo-V2.5 to Xiaomi catalogā is `local_model_providers`.
+
+- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocols, returned blocks, or streaming lifecycle route to `model_serving`.
+ Example: āautomatic multimodal routing when DeepSeek V4-Pro is primary modelā is `model_serving`.
+ A title combining provider catalog work and serving/routing behavior should include both `local_model_providers` and `model_serving`.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- `ACP` issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, or ACP integration behavior.
+
+- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.
+ Example: `ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process`
+ should be `acp` and `sessions`.
+
+- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.
+ Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+ If Claude Code is only the process affected by an ACP session-key bug, do not add `coding_agents`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.
+ In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 3: New subsample score 4.0 is better than old score 2.4395604395604398. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.5383333333333333 (coverage 30 / 30)
+Iteration 3: Val aggregate for new program: 0.5383333333333333
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.14285714285714285, 3: 0.25, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 0.25, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.25, 12: 0.25, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.2, 23: 0.25, 24: 0.2857142857142857, 25: 1.0, 26: 1.0, 27: 0.0, 28: 0.5, 29: 0.25}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.5383333333333333}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.2857142857142857}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 3: Valset pareto front aggregate score: 0.8661904761904762
+Iteration 3: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {0, 1, 2, 3}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2, 3}, 7: {0, 2}, 8: {0, 1, 2}, 9: {0, 1, 2, 3}, 10: {0, 1, 3}, 11: {0, 2}, 12: {0, 1, 2}, 13: {1, 2}, 14: {0, 1, 2, 3}, 15: {0}, 16: {0, 2, 3}, 17: {0, 1, 2, 3}, 18: {0}, 19: {1, 3}, 20: {0, 1, 2, 3}, 21: {0, 1, 2, 3}, 22: {2}, 23: {3}, 24: {1}, 25: {0, 2, 3}, 26: {0, 1, 2, 3}, 27: {2}, 28: {1, 3}, 29: {2}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 3: Best valset aggregate score so far: 0.7403968253968254
+Iteration 3: Best program as per aggregate score on valset: 2
+Iteration 3: Best score on valset: 0.7403968253968254
+Iteration 3: Linear pareto front program index: 2
+Iteration 3: New program candidate index: 3
+Iteration 4: Selected program 2 score: 0.7403968253968254
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Read the title first. Use the first clear body summary only when the title is ambiguous. Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Core process:
+1. Identify the main user-visible bug, feature, documentation change, policy change, contract, or vulnerability being changed.
+2. Pick one primary topic.
+3. Add secondary topics only when the title or first clear summary explicitly names another central maintainer-owned subject.
+4. Use 3 topics when a composite title lists three central facets.
+5. Use 0 topics when no allowed topic is central.
+6. Never invent topic ids. Never output ids outside the allowed list.
+
+Composite and vulnerability titles:
+- If a title lists several independent fixes or affected areas joined by `+`, `and`, commas, or semicolons, classify each central user-visible item.
+- A security/vulnerability PR can still need additional maintainer buckets when the vulnerable component is named centrally.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` should be `security`, `hooks`, and `skills_plugins`.
+- Use `security` for HIGH/CRITICAL vulnerabilities, SSRF, token verification vulnerabilities, network/access restrictions, credential boundaries, or allowed/blocked security behavior.
+- Do not add `auth_identity` merely because a security title mentions token verification. Use `auth_identity` only when authentication/login/OAuth/account identity behavior is itself the user-facing subject.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, hook registration, or hook policy is central.
+- Do not suppress `hooks` just because it appears inside a security title; if hook priority or hook behavior is named as an affected area, include `hooks`.
+
+Skills and plugins:
+- Use `skills_plugins` for skill scanners, skill/plugin discovery, user-installed plugins, plugin inheritance, Superpowers, plugin installation, plugin availability, or security issues in skill/plugin components.
+- `skill scanner` is a strong signal for `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex startup path mentions plugins unless plugin/skill behavior or availability is central.
+
+Telemetry, usage, and token counts:
+- Use `telemetry_usage` when token counts, usage accounting/reporting, cost display, diagnostic counts, traces, metric collection, or status reporting surfaces are themselves central.
+- Stale token counts are `telemetry_usage`.
+- `totalTokensFresh` is a session freshness/accounting signal; classify `respect totalTokensFresh flag to avoid showing stale token counts` as `sessions` and `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā counts. Use `ui_tui` only when the UI/TUI/webchat/view/reader behavior is itself central.
+
+Sessions:
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific freshness/state is central.
+- Token-count freshness tied to session state should include `sessions`.
+- Do not add `sessions` merely because the title says āACP sessionsā unless session identity, lifecycle, routing, state, or persistent process identity is the bug or feature.
+
+ACP, ACPX, and approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Reliability, queueing, and lanes:
+- Use `reliability` for deadlocks, hangs, crashes, races, stuck/wedged states, timeouts, self-healing, or robustness failures.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- Words like `lane`, `worker`, `subagent`, or internal execution paths do not imply `queueing`.
+
+Auth and identity:
+- Use `auth_identity` for authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth-related user/session identity.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex runtime behavior.
+- If OAuth/auth behavior is tied to an embedded/session path, include `sessions`.
+
+Codex:
+- Use `codex` when Codex is centrally the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because a title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, token, hook, or scanner. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 1.0357142857142856 is not better than old score 2.533333333333333, skipping
+Iteration 5: Selected program 1 score: 0.585970695970696
+Iteration 5: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids are exactly:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Do not output any topic outside this list.
+
+Classification method:
+- Read the title first.
+- Use the body only if the title is ambiguous.
+- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.
+- Choose one primary topic first, then add only explicit central secondary topics.
+- Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+- Prefer the domain meaning of the title over generic keyword matching.
+
+Important routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ This applies even if the title starts with `test(exec):`.
+ The PR type `test` does not make it a tests-only classification.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.
+
+- Do not classify something as `sandboxing` merely because the title contains `sandbox`.
+ If the sandbox is just the environment where another feature runs, classify the feature instead.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.
+ The central request is plugin inheritance for ACPX Codex, not sandbox policy.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should include `sessions` because outbound session identity is central.
+
+- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should be `gateway`, `sessions`, not `notifications`.
+
+- Do not add `notifications` merely because the title mentions message sending, outbound messages, announcement delivery, or notification-like implementation names.
+ Add `notifications` only when user-facing alerts, notification settings, notification channels, or notification delivery semantics are the central subject.
+
+- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.
+ Add `chat_integrations` only when Slack/Discord/ChatGPT/LinkedIn/chat adapter integration behavior is central.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should be `cron_automation`, `sessions`, `reliability`, not `chat_integrations`.
+
+- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should include `reliability` because the visible problem is an error during delivery.
+
+- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is central.
+
+- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, GitHub Actions supply-chain risk, or CI security posture.
+ Example: `ci: harden GitHub Actions supply-chain boundaries`
+ should be `security`, not deployment or packaging.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.
+
+- Use `skills_plugins` when user-installed plugins, skills, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, or contract.
+Keep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it.
+Iteration 5: New subsample score 4.0 is better than old score 1.0. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.5166544566544566 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.5166544566544566
+Iteration 5: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.16666666666666666, 5: 0.5, 6: 0.2857142857142857, 7: 1.0, 8: 0.25, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.25, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.5, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 1.0, 26: 0.2857142857142857, 27: 0.1, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.5166544566544566}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 0.2857142857142857}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 5: Valset pareto front aggregate score: 0.8828571428571428
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {0, 1, 2, 3, 4}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2, 3}, 7: {0, 2, 4}, 8: {0, 1, 2}, 9: {0, 1, 2, 3, 4}, 10: {0, 1, 3, 4}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2}, 14: {0, 1, 2, 3, 4}, 15: {0}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4}, 18: {0, 4}, 19: {1, 3}, 20: {0, 1, 2, 3, 4}, 21: {0, 1, 2, 3}, 22: {2}, 23: {3}, 24: {1}, 25: {0, 2, 3, 4}, 26: {0, 1, 2, 3}, 27: {2}, 28: {4}, 29: {2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 5: Best valset aggregate score so far: 0.7403968253968254
+Iteration 5: Best program as per aggregate score on valset: 2
+Iteration 5: Best score on valset: 0.7403968253968254
+Iteration 5: Linear pareto front program index: 2
+Iteration 5: New program candidate index: 4
+Iteration 6: Selected program 3 score: 0.5383333333333333
+Iteration 6: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Input format:
+- `target`: GitHub issue or PR URL.
+- `title`: the issue/PR title. This is the primary signal.
+- Optional body/metadata may be present. Use the body only when the title is ambiguous.
+- Ignore labels, paths, templates, checklists, examples, tests, implementation filenames, and repo metadata unless the user-visible request itself is about those things.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Core method:
+1. Read the title first.
+2. Identify the user-visible subject, not the implementation area.
+3. Choose one primary topic.
+4. Add only explicit central secondary topics.
+5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.
+6. Run a final suppression check: remove any topic added only because of incidental words, labels, PR type, paths, tests, examples, function names, or implementation details.
+
+Important routing rules:
+
+- Treat labels and repository metadata as noise.
+ Do not infer `gateway`, `security`, `config`, or `docs` from labels, templates, checklists, examples, paths, or catalog/documentation mentions unless the title/body makes that topic the actual user-visible request.
+
+- Local Ollama, local embeddings, local model runtime behavior, LM Studio, or other locally hosted model integrations usually include `local_models`.
+ If the problem concerns using a locally/self-hosted inference backend, endpoint, proxy, or runtime to serve inference, include `self_hosted_inference`.
+
+- A title like:
+ `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`
+ should be classified as `local_models` and `self_hosted_inference`.
+ Do not add `gateway` merely because the repo or label mentions gateway.
+ Do not add `security` merely because SSRF defenses are named; here the user-visible issue is local/self-hosted model inference failing with proxy/NO_PROXY behavior.
+
+- Documentation PRs can still have a central product topic.
+ If a title is `docs(): ...`, classify as both `docs` and that scoped topic when the documentation is about that behavior.
+ Example:
+ `docs(queue): clarify steer behavior with partial streaming and tool boundaries`
+ should be `queueing` and `docs`.
+
+- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, integrating, or maintaining a provider/catalog entry.
+ This includes provider-specific metadata such as known context windows, model limits, model catalog entries, provider-backed model listings, or provider-specific open-weight model metadata.
+
+- Open-weight model metadata routes to `open_weight_models`.
+ If the title combines open-weight model metadata with a provider/catalog entry, include both `open_weight_models` and `local_model_providers`.
+ Example:
+ `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`
+ should be `open_weight_models` and `local_model_providers`.
+
+- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocol behavior, returned blocks, partial/streaming lifecycle, or serving compatibility route to `model_serving`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example:
+ `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- LM Studio is a strong signal for `local_models`.
+
+- ACP issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, ACP integration behavior, or explicit ACP agent runs.
+
+- ACPX issues route to `acpx` when the user-visible subject is explicit agent-run orchestration, embedded backend visibility, stale final JSON state, final structured-output state for explicit runs, or ACPX-style run coordination.
+ If a title combines explicit ACP agent runs with ACP behavior, include both `acpx` and `acp`.
+ Example:
+ `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`
+ should be `acpx` and `acp`.
+
+- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.
+ Do not add `sessions` for incidental function names such as `sessions_yield` or `sessions_spawn`.
+
+- Example:
+ `ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process`
+ should be `acp` and `sessions`.
+
+- `sessions_spawn` is an internal name.
+ Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or state.
+ Example:
+ `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, and `agent_runtime`.
+
+- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.
+ Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.
+ In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, template, checklist, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, security, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 6: New subsample score 4.0 is better than old score 1.4038461538461537. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.584047619047619 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.584047619047619
+Iteration 6: Individual valset scores for new program: {0: 1.0, 1: 0.14285714285714285, 2: 0.0, 3: 0.5, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.14285714285714285, 12: 0.25, 13: 0.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.0, 20: 0.5, 21: 0.5, 22: 0.0, 23: 1.0, 24: 0.25, 25: 1.0, 26: 1.0, 27: 0.2, 28: 0.5, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.584047619047619}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 6: Valset pareto front aggregate score: 0.9416666666666667
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {0, 1, 2, 3, 4}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2, 5}, 6: {0, 1, 2, 3, 5}, 7: {0, 2, 4}, 8: {0, 1, 2}, 9: {0, 1, 2, 3, 4, 5}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2}, 14: {0, 1, 2, 3, 4, 5}, 15: {0, 5}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3}, 22: {2}, 23: {5}, 24: {1}, 25: {0, 2, 3, 4, 5}, 26: {0, 1, 2, 3, 5}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 6: Best valset aggregate score so far: 0.7403968253968254
+Iteration 6: Best program as per aggregate score on valset: 2
+Iteration 6: Best score on valset: 0.7403968253968254
+Iteration 6: Linear pareto front program index: 2
+Iteration 6: New program candidate index: 5
+Iteration 7: Selected program 2 score: 0.7403968253968254
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input:
+You may receive a target URL, title, body, summary, generated output, gold topics, or feedback. Classify the GitHub item itself, not the previous assistant response. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Specific routing rules:
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+- Example: `[codex] fix webchat full-message reader for truncated history` is not `codex`.
+
+Gateway, API surface, UI/TUI:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+- Example: `[codex] fix webchat full-message reader for truncated history` => `gateway`, `api_surface`, `ui_tui`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving. Do not classify it as `local_model_providers`, `open_weight_models`, or `model_serving` unless provider catalogs, open-weight model metadata, or serving endpoints are actually central.
+- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+
+Local model providers, open-weight models, model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug. Use it when the failure mode itself is robustness/liveness/error-handling.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Do not add `sessions` merely because the title says `ACP sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key - all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`.
+
+Queueing:
+- Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.
+- Words like lane, main lane, worker, subagent, or internal execution paths do not imply `queueing`.
+- `self-heal lane wedges` => `reliability`, not `queueing`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as docs for queueing.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- MCP tool-call parameter coercion can be both `mcp_tooling` and `tool_calling`.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract enforcement may include `exec_tools`, `sandboxing`, and `approvals` when all are named central contract areas.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central, such as announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Other topics:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers. Active-memory recall deadlocks are usually `memory` plus `reliability`.
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.
+- Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item, up to the smallest complete set.
+- Do not substitute a broad infrastructure topic for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 7: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.6160651629072682 (coverage 30 / 30)
+Iteration 7: Val aggregate for new program: 0.6160651629072682
+Iteration 7: Individual valset scores for new program: {0: 0.5, 1: 1.0, 2: 0.10526315789473684, 3: 0.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.5, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.14285714285714285, 20: 0.14285714285714285, 21: 1.0, 22: 0.5, 23: 0.10526315789473684, 24: 1.0, 25: 1.0, 26: 0.2857142857142857, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.6160651629072681}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 7: Valset pareto front aggregate score: 0.9666666666666667
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {6}, 2: {0, 2}, 3: {0, 2}, 4: {2, 6}, 5: {2, 5, 6}, 6: {0, 1, 2, 3, 5, 6}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {0, 5, 6}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5, 6}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6}, 22: {2, 6}, 23: {5}, 24: {1, 6}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 7: Best valset aggregate score so far: 0.7403968253968254
+Iteration 7: Best program as per aggregate score on valset: 2
+Iteration 7: Best score on valset: 0.7403968253968254
+Iteration 7: Linear pareto front program index: 2
+Iteration 7: New program candidate index: 6
+Iteration 8: Selected program 6 score: 0.6160651629072682
+Iteration 8: Proposed new text for routing_policy: Update the classifier rules with these corrections.
+
+The task is to classify a GitHub issue or pull request into the smallest complete set of allowed topic ids. The input may include a target URL, title, body, summary, generated output, gold topics, or feedback. Always classify the GitHub item itself, not the previous assistant response or feedback text.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Add `packaging_deployment` to the allowed topic ids.
+
+Packaging / deployment:
+- Use `packaging_deployment` when the central issue involves install/distribution packaging, desktop app setup, bundled runtime availability, PATH/runtime resolution, executable discovery, npm/npx/node availability, spawned process availability, deployment artifacts, or app startup environment problems.
+- If a browser automation feature fails because the desktop package cannot find or launch a required tool such as `npx`, include both the user-visible feature topic and `packaging_deployment`.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+- In that example, Chrome sessions, Google sign-in, and profile attach are browser automation concerns; `spawn npx ENOENT` is a packaging/deployment/runtime availability concern.
+
+Self-hosted inference vs local model providers:
+- Use `self_hosted_inference` when the central subject is supporting self-hosted inference servers, OpenAI-compatible self-hosted endpoints, or server families such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference backends.
+- Use `memory` as well when the self-hosted inference support is specifically for memory, embeddings, vector search, or embedding providers.
+- Do not use `local_model_providers` merely because the title names Ollama, vLLM, llama.cpp, TGI, LocalAI, OpenAI-compatible, provider, or model. Treat those as label spam unless the actual subject is provider catalog metadata, named provider/model-family support, provider-backed model context windows, or missing provider catalog entries.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`, not `local_model_providers`.
+
+General strategy:
+- Read the title first and identify the main user-visible bug, feature, contract change, or documentation change.
+- Pick one primary topic.
+- Add secondary topics only when they are explicitly central maintainer-owned concerns.
+- If a title lists multiple independent central concerns, include each one, but keep the set minimal.
+- Suppress topics that appear only as labels, implementation details, examples, fixture names, broad ownership areas, or incidental words.
+Iteration 8: New subsample score 4.0 is better than old score 2.75. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.4290659340659341 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.4290659340659341
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.5, 2: 0.25, 3: 0.25, 4: 0.15384615384615385, 5: 0.5, 6: 0.25, 7: 0.5, 8: 0.5, 9: 0.5, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 0.25, 21: 0.5, 22: 0.15384615384615385, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.2, 28: 0.5, 29: 0.25}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.429065934065934}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 8: Valset pareto front aggregate score: 0.9666666666666667
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {6}, 2: {0, 2}, 3: {0, 2}, 4: {2, 6}, 5: {2, 5, 6}, 6: {0, 1, 2, 3, 5, 6}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {0, 5, 6}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5, 6}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6}, 22: {2, 6}, 23: {5}, 24: {1, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 8: Best valset aggregate score so far: 0.7403968253968254
+Iteration 8: Best program as per aggregate score on valset: 2
+Iteration 8: Best score on valset: 0.7403968253968254
+Iteration 8: Linear pareto front program index: 2
+Iteration 8: New program candidate index: 7
+Iteration 9: Selected program 4 score: 0.5166544566544566
+Iteration 9: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids are exactly:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents, hooks
+
+Classification method:
+- Read the title first.
+- Use the body only if the title is ambiguous.
+- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.
+- Choose one primary topic first, then add only explicit central secondary topics.
+- Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+- Prefer the domain meaning of the title over generic keyword matching.
+- Final answer must contain only allowed topic ids.
+
+Important routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ This applies even if the title starts with `test(exec):`.
+ The PR type `test` does not make it a tests-only classification.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.
+
+- Do not classify something as `sandboxing` merely because the title contains `sandbox`.
+ If the sandbox is just the environment where another feature runs, classify the feature instead.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should include `sessions` because outbound session identity is central.
+
+- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should be `gateway`, `sessions`, not `notifications`.
+
+- Use `notifications` when user-facing alerts, notification settings, notification channels, notification delivery semantics, announcement pushes, heartbeat pushes, target-channel pushes, or notification identity/metadata on pushed messages are central.
+
+- Do not add `notifications` merely because the title mentions message sending, outbound messages, or notification-like implementation names.
+ Add it only when the delivered alert/push/announcement/notification is the user-visible subject.
+
+- Use `chat_integrations` when Slack, Discord, ChatGPT, LinkedIn, chat adapters, or target-channel chat delivery behavior is central.
+ A title about Slack pushes or target-channel Slack delivery should include `chat_integrations`.
+
+- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.
+ Add it only when the integration/channel behavior itself is central.
+
+- Treat cron-related words as incidental when cron is only the delivery path for an announcement or push.
+ Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes`
+ should be `notifications`, `chat_integrations`, not `cron_automation` or `reliability`.
+
+- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is the central product surface, not merely the path where another feature failed.
+
+- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.
+ Do not add `reliability` only because the title says `Bug`, `fix`, `dropped`, or mentions a previously fixed path. If the title is mainly about a featureās metadata/identity/channel behavior, classify that feature instead.
+
+- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, SSRF, token verification, high/critical vulnerabilities, GitHub Actions supply-chain risk, or CI security posture.
+ Example: `ci: harden GitHub Actions supply-chain boundaries`
+ should be `security`.
+
+- Use `hooks` when hook behavior, hook priority, hook execution order, hook registration, or hook security is a central subject.
+ Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification`
+ should include `security`, `hooks`, and `skills_plugins`.
+
+- Use `skills_plugins` when user-installed plugins, skills, skill scanners, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.
+
+- Do not add `local_model_providers` from unrelated scanner/provider/security wording unless local model provider behavior is explicitly central.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.
+
+- Use `acp` when ACP is explicitly central.
+
+- Use `acpx` when ACPX is explicitly central, or when the title describes ACP session behavior involving per-binding/per-agent configuration that belongs to the ACPX integration surface.
+ Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions`
+ should be `acp`, `approvals`, `acpx`.
+
+- Use `approvals` when permission mode, approval mode, permission gating, or approval policy is central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, cron, bug, fix, or contract.
+Keep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it.
+Iteration 9: New subsample score 3.25 is better than old score 1.8928571428571428. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5842551892551893 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5842551892551893
+Iteration 9: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.25, 3: 1.0, 4: 0.16666666666666666, 5: 1.0, 6: 1.0, 7: 0.25, 8: 0.25, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.5, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.15384615384615385, 24: 0.5, 25: 0.5, 26: 0.2857142857142857, 27: 0.2, 28: 0.2, 29: 0.2857142857142857}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5842551892551892}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 9: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8}, 1: {6}, 2: {0, 2}, 3: {0, 8, 2}, 4: {2, 6}, 5: {8, 2, 5, 6}, 6: {0, 1, 2, 3, 5, 6, 8}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 9: Best valset aggregate score so far: 0.7403968253968254
+Iteration 9: Best program as per aggregate score on valset: 2
+Iteration 9: Best score on valset: 0.7403968253968254
+Iteration 9: Linear pareto front program index: 2
+Iteration 9: New program candidate index: 8
+Iteration 10: Selected program 6 score: 0.6160651629072682
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, or feedback text.
+
+Input format:
+You may receive a target URL, title, body, summary, generated output, gold topics, or feedback. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Important feedback-derived rules:
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`, not `sessions`.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- In streaming/model-server titles, usage chunks are usually protocol fields, not telemetry/accounting.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing. Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `api_surface`.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Gateway, API surface, UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+
+Local model providers, open-weight models, model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, or OAuth. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 10: New subsample score 4.0 is better than old score 1.3214285714285714. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.5863186813186814 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.5863186813186814
+Iteration 10: Individual valset scores for new program: {0: 0.5, 1: 0.5, 2: 0.14285714285714285, 3: 0.15384615384615385, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.3333333333333333, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 0.25, 27: 0.5, 28: 0.5, 29: 1.0}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.5863186813186813}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 10: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8}, 1: {6}, 2: {0, 2}, 3: {0, 8, 2}, 4: {9, 2, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9}, 8: {0, 1, 2, 6, 9}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8, 9}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {9, 5}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 10: Best valset aggregate score so far: 0.7403968253968254
+Iteration 10: Best program as per aggregate score on valset: 2
+Iteration 10: Best score on valset: 0.7403968253968254
+Iteration 10: Linear pareto front program index: 2
+Iteration 10: New program candidate index: 9
+Iteration 11: Selected program 6 score: 0.6160651629072682
+Iteration 11: Proposed new text for routing_policy: Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Core task:
+Classify the GitHub item itself, not any previous assistant response, feedback text, labels, or generated output. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary clearly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, environment name, release-note fragment, or incidental phrase.
+
+Composite titles:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify only the central user-visible items needed for maintainer routing. Do not add every topic hinted by every noun. Treat weak fragments in grab-bag titles as label spam unless they clearly describe a central behavior change.
+
+Important learned example:
+`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`
+=> `codex`, `chat_integrations`
+Do not add `auth_identity` here. Although `OAuth` appears, it is not central enough for routing; treat it as label spam unless the item is specifically about OAuth/auth/login/token behavior. Also do not add `skills_plugins` merely because `plugins` appears in a Codex startup phrase.
+
+Topic rules:
+
+Codex:
+Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+Codex startup and Docker Codex setup are `codex`.
+
+Auth and identity:
+Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is the central subject.
+Do not add `auth_identity` merely because `OAuth` appears in a composite title, environment phrase, or product/setup fragment.
+Add `auth_identity` only when the item is specifically about auth behavior, restoration, account identity, credentials, login, or tokens.
+
+Chat integrations:
+Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+WhatsApp history is `chat_integrations`.
+
+Skills/plugins:
+Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.
+Do not add it merely because `plugins` appears as part of Codex startup wording.
+
+Gateway/API/UI:
+Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+Webchat full-message reader behavior is both `api_surface` and `ui_tui`; include `gateway` when gateway-backed or gateway-owned.
+
+Reliability:
+Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+Do not use `reliability` for every bug.
+
+Policy/config/security/MCP:
+Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+In policy/conformance titles, `model` usually means a policy area, not model providers or serving.
+
+Local models/providers/serving:
+Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.
+Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+
+ACP/ACPX/sessions/approvals:
+Use `acp` when ACP is named centrally.
+Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+Do not add `sessions` merely because the title says `ACP sessions`.
+
+Documentation:
+Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+Documentation-only PRs usually return `docs` alone.
+
+Tools:
+Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+Use `mcp_tooling` for MCP-specific behavior.
+Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Other topics:
+Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.
+Use `notifications` when notification behavior itself is central.
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers.
+Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.
+Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.
+Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, OAuth, or Docker. Keep a topic only if it is actually a central maintainer-owned subject.
+Iteration 11: New subsample score 3.5 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.5487301587301587 (coverage 30 / 30)
+Iteration 11: Val aggregate for new program: 0.5487301587301587
+Iteration 11: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.5, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.5, 20: 0.25, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 0.5, 27: 0.2, 28: 0.5, 29: 0.5}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.5487301587301587}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 11: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {9, 5}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 11: Best valset aggregate score so far: 0.7403968253968254
+Iteration 11: Best program as per aggregate score on valset: 2
+Iteration 11: Best score on valset: 0.7403968253968254
+Iteration 11: Linear pareto front program index: 2
+Iteration 11: New program candidate index: 10
+Iteration 12: Selected program 1 score: 0.585970695970696
+Iteration 12: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Input format:
+- target: GitHub issue or PR URL
+- title: issue/PR title
+- body: optional; read only if the title is ambiguous
+- labels, paths, examples, tests, implementation filenames, and incidental keywords may appear, but should not drive classification unless they describe the user-visible subject
+
+Output only JSON, or call `final_json` exactly once if available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Classification method:
+1. Read the title first.
+2. Use the body only if the title is ambiguous.
+3. Choose one primary topic.
+4. Add only explicit central secondary topics.
+5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.
+6. Remove any topic added only because of an incidental keyword.
+
+Domain-specific routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`.
+ The PR type `test` does not make it tests-only.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox-governed execution, or permission boundaries are `sandboxing`.
+
+- Do not classify `sandboxing` merely because a title says āsandboxā when the sandbox is just the environment where another feature runs.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, and `skills_plugins`, not `sandboxing`.
+ The central subject is ACPX/Codex inheriting user-installed plugins.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, and `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `sessions` when the issue is about session identity, lifecycle, takeover, routing, active chat ownership, persistent session state, or user-visible session conflicts.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should include `sessions`, because `EmbeddedAttemptSessionTakeoverError` and āuser is actively chattingā indicate an active session takeover conflict.
+ It should also include `cron_automation` for cron announce delivery and `reliability` because an unexpected error is triggered.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- Named external/local model provider catalogs, provider-specific model listings, and provider adapter behavior are usually `local_model_providers`.
+ Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error`
+ should be `local_model_providers` and `reliability`.
+ Do not classify this as `model_serving`; the central subject is a provider catalog gap and silent failure.
+
+- Use `reliability` when the title describes silent failure, crashes, unexpected errors, broken delivery, missing error reporting, hangs, or behavior that fails instead of producing a clear user-facing outcome.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+ Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.
+
+- Do not use `model_serving` merely because a title mentions a model name, provider name, provider catalog, or model availability. Use it only when serving protocol, endpoint compatibility, response structure, streaming, or OpenAI-compatible serving behavior is central.
+
+- Use `codex` when Codex itself, Codex-like behavior, Codex environment integration, or Codex-specific UX/runtime behavior is central.
+ In `ACPX Codex sandbox should inherit user-installed plugins`, `codex` is central.
+
+- Use `skills_plugins` when the issue is about plugins, installed plugins, inherited plugins, skills, plugin availability, or plugin runtime integration.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 12: New subsample score 3.2857142857142856 is better than old score 1.95. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.4592735042735043 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.4592735042735043
+Iteration 12: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.16666666666666666, 5: 0.5, 6: 0.2857142857142857, 7: 1.0, 8: 0.2, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 0.25, 19: 0.25, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.4592735042735043}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 12: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8, 11}, 11: {0, 2, 4, 8, 11}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {11, 4}, 29: {9, 5}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 12: Best valset aggregate score so far: 0.7403968253968254
+Iteration 12: Best program as per aggregate score on valset: 2
+Iteration 12: Best score on valset: 0.7403968253968254
+Iteration 12: Linear pareto front program index: 2
+Iteration 12: New program candidate index: 11
+Iteration 13: Selected program 2 score: 0.7403968253968254
+Iteration 13: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore labels, file paths, examples, tests, implementation files, target URL path, and incidental broad impact unless they describe the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or use `final_json` if available.
+
+Critical reliability rule:
+Use `reliability` whenever the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+
+Do not suppress `reliability` just because the deadlock/hang occurs inside an internal path such as `subagent`, `main lane`, `worker`, `before_prompt_build`, or another execution hook. Internal location words may be ignored for `queueing`, but the liveness failure itself is still central.
+
+Memory + reliability rule:
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+If an active-memory, recall, embedding, vector-store, or memory-provider item also describes a deadlock, hang, wedge, timeout, stuck state, or other liveness failure, classify it as both `memory` and `reliability`.
+
+Canonical example:
+Title: `fix #79026: active-memory recall subagent can deadlock on the main lane inside before_prompt_build`
+Correct output:
+{"topics_of_interest":["memory","reliability"]}
+
+Reason:
+`active-memory recall` makes `memory` central. `can deadlock` makes `reliability` central. `subagent`, `main lane`, and `before_prompt_build` are internal context and do not remove the reliability topic or add `queueing`.
+
+Queueing suppression:
+Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central. Words like `lane`, `main lane`, `worker`, `subagent`, or internal execution paths do not imply `queueing`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of a related word appearing in the title. Keep a topic only if it is actually a central maintainer-owned subject. In particular, do not remove `reliability` when the title explicitly says `deadlock`, `hang`, `wedged`, `stuck`, `crash`, `race`, or `timeout`.
+Iteration 13: New subsample score 2.033333333333333 is not better than old score 3.5, skipping
+Iteration 14: Selected program 6 score: 0.6160651629072682
+Iteration 14: Proposed new text for routing_policy: Refine classification with these additional routing rules:
+
+- Treat UI-looking wording as label spam unless the UI itself is the maintainer-owned subject. A title like ārespect totalTokensFresh flag to avoid showing stale token countsā is about session state plus usage/token accounting, not `ui_tui`.
+ - Use `sessions` when freshness, lifecycle, state, or per-session values determine whether data is valid.
+ - Use `telemetry_usage` when token counts, usage counts, cost/usage display, or accounting/reporting values are central.
+ - Do not add `ui_tui` just because the bug is āshownā somewhere.
+
+- For memory embeddings backed by self-hosted OpenAI-compatible servers, classify the self-hosted inference path explicitly.
+ - āmemory/embeddingsā is `memory`.
+ - āopenai-compatible provider for self-hosted serversā, including llama.cpp, Ollama, vLLM, TGI, or LocalAI, is `self_hosted_inference`.
+ - Do not use `local_model_providers` merely because provider names or model-server names appear. Use `local_model_providers` only for provider catalogs, provider metadata, provider-specific model support, or provider catalog behavior.
+
+- For local Ollama or local embedding failures involving proxy/NO_PROXY behavior, prefer local/self-hosted routing over broad security or memory labels.
+ - āLocal Ollama embeddings fail when proxy is enabledā routes to `local_models` and `self_hosted_inference`.
+ - Do not add `memory` just because the word āembeddingsā appears, unless the memory subsystem, memory provider, vector store, or active-memory recall is central.
+ - Do not add `security` just because the title mentions SSRF defenses, proxy, or NO_PROXY. Use `security` only when the security policy, access restriction, credential boundary, vulnerability, or allowed/blocked network behavior is the central user-visible subject.
+
+- Apply a final ālabel spamā suppression pass:
+ - Remove `ui_tui` if the title only implies data is displayed.
+ - Remove `local_model_providers` if the title is about using self-hosted inference servers rather than catalog/provider metadata.
+ - Remove `memory` if embeddings are only the model capability involved and the memory system itself is not central.
+ - Remove `security` if SSRF/proxy wording is implementation context for local model connectivity rather than the central policy/security concern.
+
+Canonical examples:
+- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`
+- `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`
+Iteration 14: New subsample score 4.0 is better than old score 1.592857142857143. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.4111029711029711 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.4111029711029711
+Iteration 14: Individual valset scores for new program: {0: 0.2857142857142857, 1: 0.2, 2: 0.25, 3: 0.2857142857142857, 4: 0.1111111111111111, 5: 0.25, 6: 0.2857142857142857, 7: 0.25, 8: 0.15384615384615385, 9: 1.0, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 0.5, 22: 0.25, 23: 1.0, 24: 0.2857142857142857, 25: 0.25, 26: 1.0, 27: 0.1, 28: 0.5, 29: 0.15384615384615385}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.41110297110297106}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 14: Valset pareto front aggregate score: 1.0
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6, 12}, 10: {0, 1, 3, 4, 5, 8, 11}, 11: {0, 2, 4, 8, 11}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {12}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7, 12}, 27: {2}, 28: {11, 4}, 29: {9, 5}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 14: Best valset aggregate score so far: 0.7403968253968254
+Iteration 14: Best program as per aggregate score on valset: 2
+Iteration 14: Best score on valset: 0.7403968253968254
+Iteration 14: Linear pareto front program index: 2
+Iteration 14: New program candidate index: 12
+Iteration 15: Selected program 2 score: 0.7403968253968254
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Goal:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Composite title rule:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+
+Important examples:
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock`
+ => `exec_tools`, `browser_automation`, `cron_automation`
+ Do not add `ui_tui` just because browser diagnostics might be visible somewhere.
+ Do not add generic `reliability` when the deadlock is specifically a cron force-run/lifecycle bug owned by `cron_automation`.
+
+- `fix: LM Studio thinking blocks invisible with Responses API`
+ => `local_models`, `model_serving`
+ LM Studio is a local model provider/runtime signal.
+ Responses API behavior, response block handling, streaming, endpoint compatibility, or request/response protocol behavior is `model_serving`.
+ Do not add `ui_tui` merely because something is āinvisibleā unless webchat, TUI, UI views, or a UI reader is explicitly central.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Topic rules:
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Prefer a more specific owning topic when the title names a domain-specific lifecycle bug, such as `cron force-run deadlock` => `cron_automation`.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` => `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+Open-weight, local providers, local models, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, response block handling, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local provider/runtime with Responses API or OpenAI-compatible protocol behavior, include both `local_models` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- A cron force-run deadlock is `cron_automation`; do not add generic `reliability` unless reliability/self-healing is separately named as a central concern.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Do not add `ui_tui` merely because diagnostics are layered, output is invisible, or a result is user-visible. UI/TUI must be explicitly the owned surface.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `ui_tui` for browser diagnostics unless the title explicitly names UI/TUI/webchat display behavior.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, invisible, diagnostics, or browser. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 2.642857142857143. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6716666666666666 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6716666666666666
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.2857142857142857, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 0.5, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 0.25, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.0, 23: 0.14285714285714285, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6716666666666666}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 15: Valset pareto front aggregate score: 1.0
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13}, 1: {10, 13, 6}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10, 13}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}, 15: {0, 5, 6, 9, 13}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13}, 18: {0, 4, 5, 6, 8, 9, 10, 13}, 19: {1, 3}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 15: Best valset aggregate score so far: 0.7403968253968254
+Iteration 15: Best program as per aggregate score on valset: 2
+Iteration 15: Best score on valset: 0.7403968253968254
+Iteration 15: Linear pareto front program index: 2
+Iteration 15: New program candidate index: 13
+Iteration 16: Selected program 1 score: 0.585970695970696
+Iteration 16: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, tests, implementation filenames, examples, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+General routing rules:
+- Treat bracketed prefixes such as `[codex]`, PR types such as `test(...)`, and labels as routing noise unless the actual user-visible subject is that topic.
+- Do not add `codex` merely because the title starts with `[codex]`.
+- Do not add local model topics merely because the title contains `model` in a policy, conformance, config, or security context.
+- Do not add provider/model topics from label spam or incidental words.
+
+Specific learned mappings:
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it tests-only.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- `Policy: add model, network, and MCP conformance checks` should be classified as:
+ `mcp_tooling`, `config`, `security`.
+ In this title, `model` is not `local_models` or `local_model_providers`; it is part of policy/conformance coverage. `network` and conformance/policy checks are security/config concerns.
+
+- `[codex] fix webchat full-message reader for truncated history` should be classified as:
+ `gateway`, `api_surface`, `ui_tui`.
+ Do not include `codex`; the prefix is label noise. `webchat` points to gateway/UI behavior, and `full-message reader` / `truncated history` is centrally about the API surface used to retrieve complete message content.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, or bracketed prefix. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 16: New subsample score 4.0 is better than old score 1.25. Continue to full eval and add to candidate pool.
+Iteration 16: Valset score for new program: 0.530018315018315 (coverage 30 / 30)
+Iteration 16: Val aggregate for new program: 0.530018315018315
+Iteration 16: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 0.25, 8: 1.0, 9: 0.5, 10: 1.0, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.0, 28: 0.25, 29: 0.15384615384615385}
+Iteration 16: Objective aggregate scores for new program: {'weighted_score': 0.530018315018315}
+Iteration 16: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 16: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 16: Valset pareto front aggregate score: 1.0
+Iteration 16: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {10, 13, 6, 14}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10, 13, 14}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}, 15: {0, 5, 6, 9, 13}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 16: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 16: Best valset aggregate score so far: 0.7403968253968254
+Iteration 16: Best program as per aggregate score on valset: 2
+Iteration 16: Best score on valset: 0.7403968253968254
+Iteration 16: Linear pareto front program index: 2
+Iteration 16: New program candidate index: 14
+Iteration 17: Selected program 9 score: 0.5863186813186814
+Iteration 17: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not labels, generated output, prior assistant responses, feedback text, or incidental words.
+
+Process:
+1. Read the title first. Treat it as the primary signal.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, or incidental phrase.
+
+Important domain rules:
+
+Local models / self-hosted inference / providers / security:
+- Use `local_models` when local model behavior is central, including local embeddings and local model runtime behavior.
+- Use `self_hosted_inference` when the issue concerns a locally hosted inference stack such as Ollama-backed inference or embeddings.
+- For titles like `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify as `local_models` and `self_hosted_inference`.
+- Do not add `local_model_providers` merely because Ollama, Gemini, vLLM, llama.cpp, or another provider name appears. Use `local_model_providers` only when provider-specific catalog metadata, provider integration, known context windows, named provider support, or provider/model-family support is the central subject.
+- Do not add `security` merely because the title mentions SSRF, proxy, NO_PROXY, defenses, network, or policy language. Add `security` only when the central issue is actually security policy, access restriction, credential boundary, vulnerability, or allowed/blocked security behavior.
+
+Reliability / auth / sessions:
+- Use `reliability` when the central bug is a wedge, deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, self-healing behavior, or robustness failure.
+- Use `auth_identity` when OAuth, login, sign-in, tokens, authentication, account identity, credential identity, or identity restoration is central.
+- Use `sessions` when session identity, lifecycle, routing, persistent process identity, embedded session path, session-specific state, or embedded path behavior is central.
+- If OAuth restoration is tied to an embedded path, include both `auth_identity` and `sessions`.
+- For titles like `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as `reliability`, `auth_identity`, and `sessions`.
+- Do not add `codex` merely because the title says `openai-codex` or starts with `[codex]`. Use `codex` only when Codex itself is the central product/runtime/setup being changed.
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Gateway / sessions / API surface / UI:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Use `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If gateway-backed or gateway-owned, also include `gateway`.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.
+
+Docs:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools / MCP / exec:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior, MCP conformance, MCP policy, MCP tool behavior, protocol checks, or MCP integrations.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, proxy, SSRF, NO_PROXY, Ollama, or provider. Keep the topic only if it is actually a central maintainer-owned subject.
+Iteration 17: New subsample score 3.5 is better than old score 2.642857142857143. Continue to full eval and add to candidate pool.
+Iteration 17: Valset score for new program: 0.5569841269841269 (coverage 30 / 30)
+Iteration 17: Val aggregate for new program: 0.5569841269841269
+Iteration 17: Individual valset scores for new program: {0: 0.25, 1: 0.2, 2: 0.14285714285714285, 3: 0.25, 4: 1.0, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.25, 12: 0.25, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.25, 18: 1.0, 19: 0.3333333333333333, 20: 0.2, 21: 0.5, 22: 0.25, 23: 1.0, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.5, 28: 0.5, 29: 0.5}
+Iteration 17: Objective aggregate scores for new program: {'weighted_score': 0.5569841269841269}
+Iteration 17: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 17: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 17: Valset pareto front aggregate score: 1.0
+Iteration 17: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {10, 13, 6, 14}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15}, 7: {0, 2, 4, 6, 9, 11, 15}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, 15: {0, 5, 6, 9, 13, 15}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 17: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 17: Best valset aggregate score so far: 0.7403968253968254
+Iteration 17: Best program as per aggregate score on valset: 2
+Iteration 17: Best score on valset: 0.7403968253968254
+Iteration 17: Linear pareto front program index: 2
+Iteration 17: New program candidate index: 15
+Iteration 18: Selected program 9 score: 0.5863186813186814
+Iteration 18: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only, with no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, feedback text, or gold topic list.
+
+Input format:
+You may receive a target URL, title, body, summary, generated output, gold topics, feedback, score, or error. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous. Ignore generated outputs, feedback, score, and error except when they explain how to improve the classification rule.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics when the title explicitly names three central facets.
+6. Use 0 topics only when no allowed topic is central.
+7. Never invent topic ids.
+8. Always emit the required JSON object. Never omit the final JSON.
+
+Important topic rules:
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- If a title says āper-bindingā, ābinding behaviorā, or ACPX-related binding, that is usually `acpx`.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`.
+
+Approvals / permission modes:
+- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.
+- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.
+- Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `acpx`, `approvals`.
+
+Skills / plugins:
+- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.
+- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.
+
+API surface / UI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Local model providers / open-weight models / model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, or sandbox. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output:
+Return exactly one JSON object:
+{"topics_of_interest":[...]}
+Iteration 18: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 18: Valset score for new program: 0.6092857142857143 (coverage 30 / 30)
+Iteration 18: Val aggregate for new program: 0.6092857142857143
+Iteration 18: Individual valset scores for new program: {0: 0.25, 1: 1.0, 2: 1.0, 3: 0.25, 4: 1.0, 5: 1.0, 6: 1.0, 7: 0.25, 8: 1.0, 9: 0.5, 10: 0.2, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 0.2857142857142857, 27: 0.5, 28: 0.5, 29: 1.0}
+Iteration 18: Objective aggregate scores for new program: {'weighted_score': 0.6092857142857142}
+Iteration 18: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 18: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 18: Valset pareto front aggregate score: 1.0
+Iteration 18: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {6, 10, 13, 14, 16}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16}, 5: {2, 5, 6, 8, 9, 16}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16}, 7: {0, 2, 4, 6, 9, 11, 15}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14, 16}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}, 15: {0, 5, 6, 9, 13, 15}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15}, 27: {2, 13}, 28: {11, 4}, 29: {16, 9, 5}}
+Iteration 18: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 18: Best valset aggregate score so far: 0.7403968253968254
+Iteration 18: Best program as per aggregate score on valset: 2
+Iteration 18: Best score on valset: 0.7403968253968254
+Iteration 18: Linear pareto front program index: 2
+Iteration 18: New program candidate index: 16
+Iteration 19: Selected program 16 score: 0.6092857142857143
+Iteration 19: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only, with no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not generated output, feedback, score, label text, or gold topic lists.
+
+Input handling:
+1. Read the title first. It is the primary signal.
+2. Use the first clear body or summary only if the title is ambiguous.
+3. Ignore generated outputs, score, error, and gold topics.
+4. Use feedback only as rule-learning context, not as the item to classify.
+5. Pick one primary topic.
+6. Add secondary topics only when they are explicit central maintainer-owned subjects.
+7. Use 3 topics only when the title explicitly names three central facets.
+8. Use 0 topics only when no allowed topic is central.
+9. Never invent topic ids.
+10. Always emit the required JSON object.
+
+General suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, broad ownership area, or incidental phrase.
+
+ACP / ACPX / Agent runtime:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not infer `acpx` from `ACP` alone.
+- Use `agent_runtime` when runtime execution, backend runtime, node-backed runtime, agent process/runtime behavior, or runtime ownership is central.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+- If a title says `per-binding`, `binding behavior`, or ACPX-related binding, that is usually `acpx`.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+
+Approvals / permission modes:
+- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.
+- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.
+
+Notifications / chat integrations / cron:
+- Use `notifications` when announces, alerts, heartbeat pushes, notification delivery, target channels, or notification routing are central.
+- Use `chat_integrations` when Slack, Discord, Teams, chat channels, chat pushes, or chat-specific identity/target-channel behavior is central.
+- Do not use `cron_automation` merely because a title mentions `cron`, `cron --announce`, scheduled heartbeat, or an announce path. Treat cron as label/path-like noise unless the core issue is scheduling, recurring jobs, cron execution, or automation timing itself.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Skills / plugins:
+- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.
+- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are central.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.
+
+API surface / UI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Local model providers / open-weight models / model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Provider-specific open-weight model metadata usually needs both `local_model_providers` and `open_weight_models`.
+- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` only when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, sandbox, runtime, provider, or Slack. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output:
+Return exactly one JSON object:
+{"topics_of_interest":[...]}
+Iteration 19: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 19: Valset score for new program: 0.6176678876678877 (coverage 30 / 30)
+Iteration 19: Val aggregate for new program: 0.6176678876678877
+Iteration 19: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.25, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.5, 18: 1.0, 19: 0.5, 20: 0.2, 21: 0.5, 22: 0.15384615384615385, 23: 0.14285714285714285, 24: 1.0, 25: 0.25, 26: 1.0, 27: 0.0, 28: 0.2, 29: 0.25}
+Iteration 19: Objective aggregate scores for new program: {'weighted_score': 0.6176678876678875}
+Iteration 19: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 19: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 19: Valset pareto front aggregate score: 1.0
+Iteration 19: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14, 17}, 1: {6, 10, 13, 14, 16, 17}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16, 17}, 5: {2, 5, 6, 8, 9, 16, 17}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16, 17}, 7: {0, 2, 4, 6, 9, 11, 15, 17}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16, 17}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14, 16, 17}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}, 15: {0, 5, 6, 9, 13, 15, 17}, 16: {0, 2, 3, 8, 14, 17}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16, 17}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16, 17}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15, 17}, 27: {2, 13}, 28: {11, 4}, 29: {16, 9, 5}}
+Iteration 19: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 19: Best valset aggregate score so far: 0.7403968253968254
+Iteration 19: Best program as per aggregate score on valset: 2
+Iteration 19: Best score on valset: 0.7403968253968254
+Iteration 19: Linear pareto front program index: 2
+Iteration 19: New program candidate index: 17
+Iteration 20: Selected program 2 score: 0.7403968253968254
+Iteration 20: Proposed new text for routing_policy: Additional classification rules learned from prior errors:
+
+Allowed topic ids must include `hooks` when hook behavior is part of the maintainer taxonomy.
+
+Token counts and usage freshness:
+- Titles about stale token counts, token usage counts, total token freshness flags, or usage display correctness are `telemetry_usage` when the user-visible subject is usage/count accuracy.
+- If the token count freshness is tied to session state, session identity, or per-session accounting, also include `sessions`.
+- Do not classify these as `ui_tui` merely because stale counts are shown in a UI. UI display is incidental unless the title is centrally about UI/TUI behavior.
+- Example: `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.
+
+Security bundles:
+- For composite security titles, classify each explicitly named central vulnerable subsystem, but do not over-expand from generic security words.
+- `token verification` in a vulnerability-fix title usually stays under `security`; do not add `auth_identity` unless authentication, OAuth, login, account identity, credential identity, or identity propagation is itself the central user-visible subject.
+- `skill scanner` or plugin/skill scanning vulnerability fixes should include `skills_plugins`.
+- `hook priority`, hook ordering, hook execution, or hook lifecycle should include `hooks` when hooks are an allowed topic.
+- `SSRF`, HIGH/CRITICAL vulnerabilities, network exploit prevention, and vulnerability remediation are `security`.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`, not `auth_identity`.
+
+Final suppression check additions:
+- Remove `ui_tui` if the UI is only where telemetry, tokens, or stale values appear.
+- Remove `auth_identity` if the only auth-like phrase is token verification inside a security vulnerability bundle.
+- Keep `hooks` when hook priority/order/behavior is explicitly named as one of the central fixes.
+Iteration 20: New subsample score 3.1538461538461537 is better than old score 2.45. Continue to full eval and add to candidate pool.
+Iteration 20: Valset score for new program: 0.4475383553644423 (coverage 30 / 30)
+Iteration 20: Val aggregate for new program: 0.4475383553644423
+Iteration 20: Individual valset scores for new program: {0: 0.15384615384615385, 1: 0.25, 2: 0.15384615384615385, 3: 0.16666666666666666, 4: 0.15384615384615385, 5: 0.25, 6: 0.2857142857142857, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.14285714285714285, 11: 0.2857142857142857, 12: 1.0, 13: 1.0, 14: 0.2857142857142857, 15: 0.5, 16: 0.15384615384615385, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 1.0, 22: 0.2857142857142857, 23: 0.08695652173913043, 24: 1.0, 25: 0.25, 26: 0.2857142857142857, 27: 0.5, 28: 1.0, 29: 0.25}
+Iteration 20: Objective aggregate scores for new program: {'weighted_score': 0.4475383553644424}
+Iteration 20: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 20: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 20: Valset pareto front aggregate score: 1.0
+Iteration 20: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14, 17}, 1: {6, 10, 13, 14, 16, 17}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16, 17}, 5: {2, 5, 6, 8, 9, 16, 17}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16, 17}, 7: {0, 2, 4, 6, 9, 11, 15, 17, 18}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13, 18}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16, 17}, 12: {0, 1, 2, 4, 7, 12, 18}, 13: {1, 2, 6, 9, 10, 11, 14, 16, 17, 18}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}, 15: {0, 5, 6, 9, 13, 15, 17}, 16: {0, 2, 3, 8, 14, 17}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16, 17}, 19: {1, 3, 14}, 20: {18, 12}, 21: {0, 1, 2, 3, 6, 8, 13, 18}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16, 17, 18}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15, 17}, 27: {2, 13}, 28: {18, 11, 4}, 29: {16, 9, 5}}
+Iteration 20: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 20: Best valset aggregate score so far: 0.7403968253968254
+Iteration 20: Best program as per aggregate score on valset: 2
+Iteration 20: Best score on valset: 0.7403968253968254
+Iteration 20: Linear pareto front program index: 2
+Iteration 20: New program candidate index: 18
+{
+ "best_idx": 2,
+ "best_prompt_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.prompt.md",
+ "best_routing_policy_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z/best.routing_policy.md",
+ "best_score": 0.7403968253968254,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 900000
+ },
+ "max_candidate_proposals": 20,
+ "max_metric_calls": 720,
+ "output_dir": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 30,
+ "seed": 0,
+ "seed_routing_policy_chars": 12379,
+ "seed_routing_policy_sha256": "6389ec2152a1c1c9c8e7fd5c7a3b522f95eba44351fc274647d3fdbca5490b14"
+ },
+ "created_at": "2026-06-14T08:15:46.679057+00:00",
+ "num_candidates": 19,
+ "num_full_val_evals": 19,
+ "result_path": "/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z/gepa-result.json",
+ "total_metric_calls": 730
+}
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.json b/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..5e736f3c741d6355c4bf29209db28ec9893bd9c3
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.json
@@ -0,0 +1,1036 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 3,
+ 10,
+ 19,
+ 17
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.5,
+ 0.2,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 0.15384615384615385,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 1,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 5,
+ 0,
+ 7,
+ 27
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 1.0,
+ 0.125
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 2,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 2,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 25,
+ 14,
+ 21,
+ 26
+ ],
+ "subsample_scores": [
+ 0.15384615384615385,
+ 1.0,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 3,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 3,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 22,
+ 2,
+ 20,
+ 4
+ ],
+ "subsample_scores": [
+ 0.3333333333333333,
+ 1.0,
+ 0.2,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 0.2857142857142857,
+ 0.25,
+ 0.25,
+ 0.25
+ ]
+ },
+ {
+ "i": 4,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 28,
+ 6,
+ 18,
+ 11
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.25,
+ 0.25,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 4,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 5,
+ "selected_program_candidate": 3,
+ "subsample_ids": [
+ 23,
+ 9,
+ 29,
+ 15
+ ],
+ "subsample_scores": [
+ 0.15384615384615385,
+ 0.5,
+ 0.25,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 5,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 6,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 16,
+ 8,
+ 1,
+ 13
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.25,
+ 0.2857142857142857,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 6,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 7,
+ "selected_program_candidate": 6,
+ "subsample_ids": [
+ 24,
+ 12,
+ 12,
+ 24
+ ],
+ "subsample_scores": [
+ 0.5,
+ 0.25,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 7,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 8,
+ "selected_program_candidate": 4,
+ "subsample_ids": [
+ 16,
+ 22,
+ 5,
+ 19
+ ],
+ "subsample_scores": [
+ 0.14285714285714285,
+ 0.25,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.25,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 8,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 9,
+ "selected_program_candidate": 6,
+ "subsample_ids": [
+ 29,
+ 4,
+ 6,
+ 24
+ ],
+ "subsample_scores": [
+ 0.2857142857142857,
+ 0.2857142857142857,
+ 0.25,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 9,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 10,
+ "selected_program_candidate": 6,
+ "subsample_ids": [
+ 18,
+ 14,
+ 9,
+ 3
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 10,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 11,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 28,
+ 13,
+ 11,
+ 26
+ ],
+ "subsample_scores": [
+ 0.5,
+ 0.2,
+ 0.25,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 0.2857142857142857
+ ],
+ "new_program_idx": 11,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 12,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 7,
+ 10,
+ 15,
+ 27
+ ],
+ "subsample_scores": [
+ 0.5,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.3333333333333333,
+ 0.5,
+ 0.2
+ ]
+ },
+ {
+ "i": 13,
+ "selected_program_candidate": 6,
+ "subsample_ids": [
+ 20,
+ 21,
+ 12,
+ 23
+ ],
+ "subsample_scores": [
+ 0.2,
+ 1.0,
+ 0.25,
+ 0.14285714285714285
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 12,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 14,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 2,
+ 0,
+ 17,
+ 25
+ ],
+ "subsample_scores": [
+ 0.14285714285714285,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 13,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 15,
+ "selected_program_candidate": 1,
+ "subsample_ids": [
+ 1,
+ 8,
+ 8,
+ 1
+ ],
+ "subsample_scores": [
+ 0.25,
+ 0.25,
+ 0.5,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 14,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 16,
+ "selected_program_candidate": 9,
+ "subsample_ids": [
+ 13,
+ 23,
+ 27,
+ 24
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.14285714285714285,
+ 0.5,
+ 1.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 15,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 17,
+ "selected_program_candidate": 9,
+ "subsample_ids": [
+ 8,
+ 5,
+ 11,
+ 1
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.5,
+ 0.5,
+ 0.0
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 16,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 18,
+ "selected_program_candidate": 16,
+ "subsample_ids": [
+ 16,
+ 4,
+ 15,
+ 0
+ ],
+ "subsample_scores": [
+ 0.25,
+ 1.0,
+ 0.5,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 17,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ },
+ {
+ "i": 19,
+ "selected_program_candidate": 2,
+ "subsample_ids": [
+ 21,
+ 3,
+ 20,
+ 22
+ ],
+ "subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.2,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 0.15384615384615385,
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 18,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20,
+ 21,
+ 22,
+ 23,
+ 24,
+ 25,
+ 26,
+ 27,
+ 28,
+ 29
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.txt b/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4ecf7764d0906634b6e2e97d8a7b375b6fe9d7f6
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log.txt
@@ -0,0 +1,2016 @@
+Iteration 0: Base program full valset score: 0.6877944862155388 over 30 / 30 examples
+Iteration 1: Selected program 0 score: 0.6877944862155388
+Iteration 1: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids include:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+Important routing rules learned from prior errors:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+ Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.
+
+- Do not add `sessions` merely because a title contains `session`, `sessions_spawn`, channel IDs, binding IDs, or session-like implementation names. Add it only when session identity, lifecycle, routing, state, persistence, or session-specific user behavior is central.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area. In particular, `exec v2 contract` implies sandboxing and approvals.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 1: New subsample score 3.1538461538461537 is better than old score 2.2. Continue to full eval and add to candidate pool.
+Iteration 1: Valset score for new program: 0.585970695970696 (coverage 30 / 30)
+Iteration 1: Val aggregate for new program: 0.585970695970696
+Iteration 1: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.2857142857142857, 5: 0.0, 6: 1.0, 7: 0.25, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.25, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.15384615384615385, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.585970695970696}
+Iteration 1: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.25, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.6877944862155387}
+Iteration 1: Valset pareto front aggregate score: 0.7796520146520147
+Iteration 1: Updated valset pareto front programs: {0: {0, 1}, 1: {0, 1}, 2: {0}, 3: {0}, 4: {0, 1}, 5: {0}, 6: {0, 1}, 7: {0}, 8: {0, 1}, 9: {0, 1}, 10: {0, 1}, 11: {0}, 12: {0, 1}, 13: {1}, 14: {0, 1}, 15: {0}, 16: {0}, 17: {0, 1}, 18: {0}, 19: {1}, 20: {0, 1}, 21: {0, 1}, 22: {1}, 23: {1}, 24: {1}, 25: {0}, 26: {0, 1}, 27: {1}, 28: {1}, 29: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {0}}
+Iteration 1: Best valset aggregate score so far: 0.6877944862155388
+Iteration 1: Best program as per aggregate score on valset: 0
+Iteration 1: Best score on valset: 0.6877944862155388
+Iteration 1: Linear pareto front program index: 0
+Iteration 1: New program candidate index: 1
+Iteration 2: Selected program 0 score: 0.6877944862155388
+Iteration 2: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 2: New subsample score 4.0 is better than old score 2.375. Continue to full eval and add to candidate pool.
+Iteration 2: Found a better program on the valset with score 0.7403968253968254.
+Iteration 2: Valset score for new program: 0.7403968253968254 (coverage 30 / 30)
+Iteration 2: Val aggregate for new program: 0.7403968253968254
+Iteration 2: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.5, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.25, 19: 0.3333333333333333, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.14285714285714285, 24: 0.25, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.0, 29: 0.2857142857142857}
+Iteration 2: Objective aggregate scores for new program: {'weighted_score': 0.7403968253968253}
+Iteration 2: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.15384615384615385, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.2857142857142857}
+Iteration 2: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 2: Valset pareto front aggregate score: 0.862985347985348
+Iteration 2: Updated valset pareto front programs: {0: {0, 1, 2}, 1: {0, 1, 2}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2}, 7: {0, 2}, 8: {0, 1, 2}, 9: {0, 1, 2}, 10: {0, 1}, 11: {0, 2}, 12: {0, 1, 2}, 13: {1, 2}, 14: {0, 1, 2}, 15: {0}, 16: {0, 2}, 17: {0, 1, 2}, 18: {0}, 19: {1}, 20: {0, 1, 2}, 21: {0, 1, 2}, 22: {2}, 23: {1}, 24: {1}, 25: {0, 2}, 26: {0, 1, 2}, 27: {2}, 28: {1}, 29: {2}}
+Iteration 2: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 2: Best valset aggregate score so far: 0.7403968253968254
+Iteration 2: Best program as per aggregate score on valset: 2
+Iteration 2: Best score on valset: 0.7403968253968254
+Iteration 2: Linear pareto front program index: 2
+Iteration 2: New program candidate index: 2
+Iteration 3: Selected program 1 score: 0.585970695970696
+Iteration 3: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Task:
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, examples, tests, implementation filenames, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+Important routing rules:
+
+- Treat labels and repository metadata as noise unless the title/body makes them central.
+ Do not infer `config` or `docs` from labels, issue templates, checklists, examples, or catalog/documentation mentions unless the user-visible request is actually about configuration behavior or documentation.
+
+- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, or integrating a provider/catalog entry.
+ Example: āAdd MiMo-V2.5 to Xiaomi catalogā is `local_model_providers`.
+
+- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocols, returned blocks, or streaming lifecycle route to `model_serving`.
+ Example: āautomatic multimodal routing when DeepSeek V4-Pro is primary modelā is `model_serving`.
+ A title combining provider catalog work and serving/routing behavior should include both `local_model_providers` and `model_serving`.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- `ACP` issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, or ACP integration behavior.
+
+- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.
+ Example: `ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process`
+ should be `acp` and `sessions`.
+
+- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.
+ Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+ If Claude Code is only the process affected by an ACP session-key bug, do not add `coding_agents`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, or persistent session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.
+ In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 3: New subsample score 4.0 is better than old score 2.4395604395604398. Continue to full eval and add to candidate pool.
+Iteration 3: Valset score for new program: 0.5383333333333333 (coverage 30 / 30)
+Iteration 3: Val aggregate for new program: 0.5383333333333333
+Iteration 3: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.14285714285714285, 3: 0.25, 4: 0.2857142857142857, 5: 0.25, 6: 1.0, 7: 0.25, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.25, 12: 0.25, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.2, 23: 0.25, 24: 0.2857142857142857, 25: 1.0, 26: 1.0, 27: 0.0, 28: 0.5, 29: 0.25}
+Iteration 3: Objective aggregate scores for new program: {'weighted_score': 0.5383333333333333}
+Iteration 3: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.2857142857142857}
+Iteration 3: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 3: Valset pareto front aggregate score: 0.8661904761904762
+Iteration 3: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {0, 1, 2, 3}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2, 3}, 7: {0, 2}, 8: {0, 1, 2}, 9: {0, 1, 2, 3}, 10: {0, 1, 3}, 11: {0, 2}, 12: {0, 1, 2}, 13: {1, 2}, 14: {0, 1, 2, 3}, 15: {0}, 16: {0, 2, 3}, 17: {0, 1, 2, 3}, 18: {0}, 19: {1, 3}, 20: {0, 1, 2, 3}, 21: {0, 1, 2, 3}, 22: {2}, 23: {3}, 24: {1}, 25: {0, 2, 3}, 26: {0, 1, 2, 3}, 27: {2}, 28: {1, 3}, 29: {2}}
+Iteration 3: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 3: Best valset aggregate score so far: 0.7403968253968254
+Iteration 3: Best program as per aggregate score on valset: 2
+Iteration 3: Best score on valset: 0.7403968253968254
+Iteration 3: Linear pareto front program index: 2
+Iteration 3: New program candidate index: 3
+Iteration 4: Selected program 2 score: 0.7403968253968254
+Iteration 4: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, hooks
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Read the title first. Use the first clear body summary only when the title is ambiguous. Ignore examples, tests, files changed, labels, URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Core process:
+1. Identify the main user-visible bug, feature, documentation change, policy change, contract, or vulnerability being changed.
+2. Pick one primary topic.
+3. Add secondary topics only when the title or first clear summary explicitly names another central maintainer-owned subject.
+4. Use 3 topics when a composite title lists three central facets.
+5. Use 0 topics when no allowed topic is central.
+6. Never invent topic ids. Never output ids outside the allowed list.
+
+Composite and vulnerability titles:
+- If a title lists several independent fixes or affected areas joined by `+`, `and`, commas, or semicolons, classify each central user-visible item.
+- A security/vulnerability PR can still need additional maintainer buckets when the vulnerable component is named centrally.
+- `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` should be `security`, `hooks`, and `skills_plugins`.
+- Use `security` for HIGH/CRITICAL vulnerabilities, SSRF, token verification vulnerabilities, network/access restrictions, credential boundaries, or allowed/blocked security behavior.
+- Do not add `auth_identity` merely because a security title mentions token verification. Use `auth_identity` only when authentication/login/OAuth/account identity behavior is itself the user-facing subject.
+
+Hooks:
+- Use `hooks` when hook behavior, hook priority, hook ordering, hook execution, hook registration, or hook policy is central.
+- Do not suppress `hooks` just because it appears inside a security title; if hook priority or hook behavior is named as an affected area, include `hooks`.
+
+Skills and plugins:
+- Use `skills_plugins` for skill scanners, skill/plugin discovery, user-installed plugins, plugin inheritance, Superpowers, plugin installation, plugin availability, or security issues in skill/plugin components.
+- `skill scanner` is a strong signal for `skills_plugins`.
+- Do not add `skills_plugins` merely because a Codex startup path mentions plugins unless plugin/skill behavior or availability is central.
+
+Telemetry, usage, and token counts:
+- Use `telemetry_usage` when token counts, usage accounting/reporting, cost display, diagnostic counts, traces, metric collection, or status reporting surfaces are themselves central.
+- Stale token counts are `telemetry_usage`.
+- `totalTokensFresh` is a session freshness/accounting signal; classify `respect totalTokensFresh flag to avoid showing stale token counts` as `sessions` and `telemetry_usage`.
+- Do not add `ui_tui` merely because the title says āshowingā or ādisplayingā counts. Use `ui_tui` only when the UI/TUI/webchat/view/reader behavior is itself central.
+
+Sessions:
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific freshness/state is central.
+- Token-count freshness tied to session state should include `sessions`.
+- Do not add `sessions` merely because the title says āACP sessionsā unless session identity, lifecycle, routing, state, or persistent process identity is the bug or feature.
+
+ACP, ACPX, and approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+
+Reliability, queueing, and lanes:
+- Use `reliability` for deadlocks, hangs, crashes, races, stuck/wedged states, timeouts, self-healing, or robustness failures.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- Words like `lane`, `worker`, `subagent`, or internal execution paths do not imply `queueing`.
+
+Auth and identity:
+- Use `auth_identity` for authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth-related user/session identity.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex runtime behavior.
+- If OAuth/auth behavior is tied to an embedded/session path, include `sessions`.
+
+Codex:
+- Use `codex` when Codex is centrally the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because a title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, token, hook, or scanner. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 4: New subsample score 1.0357142857142856 is not better than old score 2.533333333333333, skipping
+Iteration 5: Selected program 1 score: 0.585970695970696
+Iteration 5: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids are exactly:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Do not output any topic outside this list.
+
+Classification method:
+- Read the title first.
+- Use the body only if the title is ambiguous.
+- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.
+- Choose one primary topic first, then add only explicit central secondary topics.
+- Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+- Prefer the domain meaning of the title over generic keyword matching.
+
+Important routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ This applies even if the title starts with `test(exec):`.
+ The PR type `test` does not make it a tests-only classification.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.
+
+- Do not classify something as `sandboxing` merely because the title contains `sandbox`.
+ If the sandbox is just the environment where another feature runs, classify the feature instead.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.
+ The central request is plugin inheritance for ACPX Codex, not sandbox policy.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should include `sessions` because outbound session identity is central.
+
+- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should be `gateway`, `sessions`, not `notifications`.
+
+- Do not add `notifications` merely because the title mentions message sending, outbound messages, announcement delivery, or notification-like implementation names.
+ Add `notifications` only when user-facing alerts, notification settings, notification channels, or notification delivery semantics are the central subject.
+
+- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.
+ Add `chat_integrations` only when Slack/Discord/ChatGPT/LinkedIn/chat adapter integration behavior is central.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should be `cron_automation`, `sessions`, `reliability`, not `chat_integrations`.
+
+- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should include `reliability` because the visible problem is an error during delivery.
+
+- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is central.
+
+- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, GitHub Actions supply-chain risk, or CI security posture.
+ Example: `ci: harden GitHub Actions supply-chain boundaries`
+ should be `security`, not deployment or packaging.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.
+
+- Use `skills_plugins` when user-installed plugins, skills, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, or contract.
+Keep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it.
+Iteration 5: New subsample score 4.0 is better than old score 1.0. Continue to full eval and add to candidate pool.
+Iteration 5: Valset score for new program: 0.5166544566544566 (coverage 30 / 30)
+Iteration 5: Val aggregate for new program: 0.5166544566544566
+Iteration 5: Individual valset scores for new program: {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.16666666666666666, 5: 0.5, 6: 0.2857142857142857, 7: 1.0, 8: 0.25, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.25, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.5, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 1.0, 26: 0.2857142857142857, 27: 0.1, 28: 1.0, 29: 0.15384615384615385}
+Iteration 5: Objective aggregate scores for new program: {'weighted_score': 0.5166544566544566}
+Iteration 5: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.2, 21: 1.0, 22: 0.5, 23: 0.25, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 0.2857142857142857}
+Iteration 5: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 5: Valset pareto front aggregate score: 0.8828571428571428
+Iteration 5: Updated valset pareto front programs: {0: {0, 1, 2, 3}, 1: {0, 1, 2, 3, 4}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2}, 6: {0, 1, 2, 3}, 7: {0, 2, 4}, 8: {0, 1, 2}, 9: {0, 1, 2, 3, 4}, 10: {0, 1, 3, 4}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2}, 14: {0, 1, 2, 3, 4}, 15: {0}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4}, 18: {0, 4}, 19: {1, 3}, 20: {0, 1, 2, 3, 4}, 21: {0, 1, 2, 3}, 22: {2}, 23: {3}, 24: {1}, 25: {0, 2, 3, 4}, 26: {0, 1, 2, 3}, 27: {2}, 28: {4}, 29: {2}}
+Iteration 5: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 5: Best valset aggregate score so far: 0.7403968253968254
+Iteration 5: Best program as per aggregate score on valset: 2
+Iteration 5: Best score on valset: 0.7403968253968254
+Iteration 5: Linear pareto front program index: 2
+Iteration 5: New program candidate index: 4
+Iteration 6: Selected program 3 score: 0.5383333333333333
+Iteration 6: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Input format:
+- `target`: GitHub issue or PR URL.
+- `title`: the issue/PR title. This is the primary signal.
+- Optional body/metadata may be present. Use the body only when the title is ambiguous.
+- Ignore labels, paths, templates, checklists, examples, tests, implementation filenames, and repo metadata unless the user-visible request itself is about those things.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Core method:
+1. Read the title first.
+2. Identify the user-visible subject, not the implementation area.
+3. Choose one primary topic.
+4. Add only explicit central secondary topics.
+5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.
+6. Run a final suppression check: remove any topic added only because of incidental words, labels, PR type, paths, tests, examples, function names, or implementation details.
+
+Important routing rules:
+
+- Treat labels and repository metadata as noise.
+ Do not infer `gateway`, `security`, `config`, or `docs` from labels, templates, checklists, examples, paths, or catalog/documentation mentions unless the title/body makes that topic the actual user-visible request.
+
+- Local Ollama, local embeddings, local model runtime behavior, LM Studio, or other locally hosted model integrations usually include `local_models`.
+ If the problem concerns using a locally/self-hosted inference backend, endpoint, proxy, or runtime to serve inference, include `self_hosted_inference`.
+
+- A title like:
+ `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`
+ should be classified as `local_models` and `self_hosted_inference`.
+ Do not add `gateway` merely because the repo or label mentions gateway.
+ Do not add `security` merely because SSRF defenses are named; here the user-visible issue is local/self-hosted model inference failing with proxy/NO_PROXY behavior.
+
+- Documentation PRs can still have a central product topic.
+ If a title is `docs(): ...`, classify as both `docs` and that scoped topic when the documentation is about that behavior.
+ Example:
+ `docs(queue): clarify steer behavior with partial streaming and tool boundaries`
+ should be `queueing` and `docs`.
+
+- Catalog/provider requests route to `local_model_providers` when the central subject is adding, updating, integrating, or maintaining a provider/catalog entry.
+ This includes provider-specific metadata such as known context windows, model limits, model catalog entries, provider-backed model listings, or provider-specific open-weight model metadata.
+
+- Open-weight model metadata routes to `open_weight_models`.
+ If the title combines open-weight model metadata with a provider/catalog entry, include both `open_weight_models` and `local_model_providers`.
+ Example:
+ `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models`
+ should be `open_weight_models` and `local_model_providers`.
+
+- Automatic model routing, multimodal routing, fallback routing, OpenAI-compatible endpoint behavior, response protocol behavior, returned blocks, partial/streaming lifecycle, or serving compatibility route to `model_serving`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example:
+ `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- LM Studio is a strong signal for `local_models`.
+
+- ACP issues route to `acp` when the user-visible subject is ACP binding, ACP channels, ACP protocol behavior, ACP integration behavior, or explicit ACP agent runs.
+
+- ACPX issues route to `acpx` when the user-visible subject is explicit agent-run orchestration, embedded backend visibility, stale final JSON state, final structured-output state for explicit runs, or ACPX-style run coordination.
+ If a title combines explicit ACP agent runs with ACP behavior, include both `acpx` and `acp`.
+ Example:
+ `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield`
+ should be `acpx` and `acp`.
+
+- Add `sessions` only when session identity, lifecycle, routing, persistence, session keys, session state, or session-specific user behavior is central.
+ Do not add `sessions` for incidental function names such as `sessions_yield` or `sessions_spawn`.
+
+- Example:
+ `ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process`
+ should be `acp` and `sessions`.
+
+- `sessions_spawn` is an internal name.
+ Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or state.
+ Example:
+ `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, and `agent_runtime`.
+
+- Do not add `coding_agents` merely because the title mentions Claude Code, Codex-like processes, persistent agent processes, or agent implementation details.
+ Use `coding_agents` only when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it a tests-only classification.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- Do not suppress a central contract topic just because the exact word is absent from the title if the named contract is known to cover that area.
+ In particular, `exec v2 contract` implies `exec_tools`, `sandboxing`, and `approvals`.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, template, checklist, session-like function name, model, provider, usage, gateway, plugin, tool, contract, docs, config, security, Claude Code, or catalog. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 6: New subsample score 4.0 is better than old score 1.4038461538461537. Continue to full eval and add to candidate pool.
+Iteration 6: Valset score for new program: 0.584047619047619 (coverage 30 / 30)
+Iteration 6: Val aggregate for new program: 0.584047619047619
+Iteration 6: Individual valset scores for new program: {0: 1.0, 1: 0.14285714285714285, 2: 0.0, 3: 0.5, 4: 0.2857142857142857, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.14285714285714285, 12: 0.25, 13: 0.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.0, 20: 0.5, 21: 0.5, 22: 0.0, 23: 1.0, 24: 0.25, 25: 1.0, 26: 1.0, 27: 0.2, 28: 0.5, 29: 1.0}
+Iteration 6: Objective aggregate scores for new program: {'weighted_score': 0.584047619047619}
+Iteration 6: New valset pareto front scores: {0: 1.0, 1: 0.25, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 6: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 6: Valset pareto front aggregate score: 0.9416666666666667
+Iteration 6: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {0, 1, 2, 3, 4}, 2: {0, 2}, 3: {0, 2}, 4: {2}, 5: {2, 5}, 6: {0, 1, 2, 3, 5}, 7: {0, 2, 4}, 8: {0, 1, 2}, 9: {0, 1, 2, 3, 4, 5}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2}, 14: {0, 1, 2, 3, 4, 5}, 15: {0, 5}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3}, 22: {2}, 23: {5}, 24: {1}, 25: {0, 2, 3, 4, 5}, 26: {0, 1, 2, 3, 5}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 6: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 6: Best valset aggregate score so far: 0.7403968253968254
+Iteration 6: Best program as per aggregate score on valset: 2
+Iteration 6: Best score on valset: 0.7403968253968254
+Iteration 6: Linear pareto front program index: 2
+Iteration 6: New program candidate index: 5
+Iteration 7: Selected program 2 score: 0.7403968253968254
+Iteration 7: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input:
+You may receive a target URL, title, body, summary, generated output, gold topics, or feedback. Classify the GitHub item itself, not the previous assistant response. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Specific routing rules:
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+- Example: `[codex] fix webchat full-message reader for truncated history` is not `codex`.
+
+Gateway, API surface, UI/TUI:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+- Example: `[codex] fix webchat full-message reader for truncated history` => `gateway`, `api_surface`, `ui_tui`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving. Do not classify it as `local_model_providers`, `open_weight_models`, or `model_serving` unless provider catalogs, open-weight model metadata, or serving endpoints are actually central.
+- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+
+Local model providers, open-weight models, model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+- Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error` => `local_model_providers`, `reliability`.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug. Use it when the failure mode itself is robustness/liveness/error-handling.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Do not add `sessions` merely because the title says `ACP sessions`.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key - all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`.
+
+Queueing:
+- Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.
+- Words like lane, main lane, worker, subagent, or internal execution paths do not imply `queueing`.
+- `self-heal lane wedges` => `reliability`, not `queueing`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as docs for queueing.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- MCP tool-call parameter coercion can be both `mcp_tooling` and `tool_calling`.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract enforcement may include `exec_tools`, `sandboxing`, and `approvals` when all are named central contract areas.
+
+Notifications and chat:
+- Use `notifications` when notification behavior itself is central, such as announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, or notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Other topics:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers. Active-memory recall deadlocks are usually `memory` plus `reliability`.
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.
+- Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item, up to the smallest complete set.
+- Do not substitute a broad infrastructure topic for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 7: New subsample score 4.0 is better than old score 2.0357142857142856. Continue to full eval and add to candidate pool.
+Iteration 7: Valset score for new program: 0.6160651629072682 (coverage 30 / 30)
+Iteration 7: Val aggregate for new program: 0.6160651629072682
+Iteration 7: Individual valset scores for new program: {0: 0.5, 1: 1.0, 2: 0.10526315789473684, 3: 0.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 0.2, 11: 0.25, 12: 0.5, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.14285714285714285, 20: 0.14285714285714285, 21: 1.0, 22: 0.5, 23: 0.10526315789473684, 24: 1.0, 25: 1.0, 26: 0.2857142857142857, 27: 0.25, 28: 0.5, 29: 0.25}
+Iteration 7: Objective aggregate scores for new program: {'weighted_score': 0.6160651629072681}
+Iteration 7: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 7: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 7: Valset pareto front aggregate score: 0.9666666666666667
+Iteration 7: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {6}, 2: {0, 2}, 3: {0, 2}, 4: {2, 6}, 5: {2, 5, 6}, 6: {0, 1, 2, 3, 5, 6}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6}, 15: {0, 5, 6}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5, 6}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6}, 22: {2, 6}, 23: {5}, 24: {1, 6}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 7: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 7: Best valset aggregate score so far: 0.7403968253968254
+Iteration 7: Best program as per aggregate score on valset: 2
+Iteration 7: Best score on valset: 0.7403968253968254
+Iteration 7: Linear pareto front program index: 2
+Iteration 7: New program candidate index: 6
+Iteration 8: Selected program 6 score: 0.6160651629072682
+Iteration 8: Proposed new text for routing_policy: Update the classifier rules with these corrections.
+
+The task is to classify a GitHub issue or pull request into the smallest complete set of allowed topic ids. The input may include a target URL, title, body, summary, generated output, gold topics, or feedback. Always classify the GitHub item itself, not the previous assistant response or feedback text.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Add `packaging_deployment` to the allowed topic ids.
+
+Packaging / deployment:
+- Use `packaging_deployment` when the central issue involves install/distribution packaging, desktop app setup, bundled runtime availability, PATH/runtime resolution, executable discovery, npm/npx/node availability, spawned process availability, deployment artifacts, or app startup environment problems.
+- If a browser automation feature fails because the desktop package cannot find or launch a required tool such as `npx`, include both the user-visible feature topic and `packaging_deployment`.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+- In that example, Chrome sessions, Google sign-in, and profile attach are browser automation concerns; `spawn npx ENOENT` is a packaging/deployment/runtime availability concern.
+
+Self-hosted inference vs local model providers:
+- Use `self_hosted_inference` when the central subject is supporting self-hosted inference servers, OpenAI-compatible self-hosted endpoints, or server families such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference backends.
+- Use `memory` as well when the self-hosted inference support is specifically for memory, embeddings, vector search, or embedding providers.
+- Do not use `local_model_providers` merely because the title names Ollama, vLLM, llama.cpp, TGI, LocalAI, OpenAI-compatible, provider, or model. Treat those as label spam unless the actual subject is provider catalog metadata, named provider/model-family support, provider-backed model context windows, or missing provider catalog entries.
+- Example: `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`, not `local_model_providers`.
+
+General strategy:
+- Read the title first and identify the main user-visible bug, feature, contract change, or documentation change.
+- Pick one primary topic.
+- Add secondary topics only when they are explicitly central maintainer-owned concerns.
+- If a title lists multiple independent central concerns, include each one, but keep the set minimal.
+- Suppress topics that appear only as labels, implementation details, examples, fixture names, broad ownership areas, or incidental words.
+Iteration 8: New subsample score 4.0 is better than old score 2.75. Continue to full eval and add to candidate pool.
+Iteration 8: Valset score for new program: 0.4290659340659341 (coverage 30 / 30)
+Iteration 8: Val aggregate for new program: 0.4290659340659341
+Iteration 8: Individual valset scores for new program: {0: 0.5, 1: 0.5, 2: 0.25, 3: 0.25, 4: 0.15384615384615385, 5: 0.5, 6: 0.25, 7: 0.5, 8: 0.5, 9: 0.5, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.5, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 0.25, 21: 0.5, 22: 0.15384615384615385, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.2, 28: 0.5, 29: 0.25}
+Iteration 8: Objective aggregate scores for new program: {'weighted_score': 0.429065934065934}
+Iteration 8: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 0.5, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 8: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 8: Valset pareto front aggregate score: 0.9666666666666667
+Iteration 8: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5}, 1: {6}, 2: {0, 2}, 3: {0, 2}, 4: {2, 6}, 5: {2, 5, 6}, 6: {0, 1, 2, 3, 5, 6}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5}, 11: {0, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6, 7}, 15: {0, 5, 6}, 16: {0, 2, 3}, 17: {0, 1, 2, 3, 4, 5}, 18: {0, 4, 5, 6}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6}, 22: {2, 6}, 23: {5}, 24: {1, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 8: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 8: Best valset aggregate score so far: 0.7403968253968254
+Iteration 8: Best program as per aggregate score on valset: 2
+Iteration 8: Best score on valset: 0.7403968253968254
+Iteration 8: Linear pareto front program index: 2
+Iteration 8: New program candidate index: 7
+Iteration 9: Selected program 4 score: 0.5166544566544566
+Iteration 9: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids are exactly:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents, hooks
+
+Classification method:
+- Read the title first.
+- Use the body only if the title is ambiguous.
+- Ignore labels, paths, tests, examples, implementation filenames, PR type prefixes, and incidental keywords unless they are the user-visible subject.
+- Choose one primary topic first, then add only explicit central secondary topics.
+- Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+- Prefer the domain meaning of the title over generic keyword matching.
+- Final answer must contain only allowed topic ids.
+
+Important routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ This applies even if the title starts with `test(exec):`.
+ The PR type `test` does not make it a tests-only classification.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox enforcement, sandbox permissions, or sandbox-governed execution are `sandboxing`.
+
+- Do not classify something as `sandboxing` merely because the title contains `sandbox`.
+ If the sandbox is just the environment where another feature runs, classify the feature instead.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, `skills_plugins`, not `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistent state, takeover, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+
+- Use `sessions` when session identity, lifecycle, ownership, routing, takeover, persistence, or active-session state is central.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should include `sessions` because outbound session identity is central.
+
+- Use `gateway` when the central subject is gateway send behavior, guarded gateway send denial, gateway routing, or gateway delivery boundaries.
+ Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial`
+ should be `gateway`, `sessions`, not `notifications`.
+
+- Use `notifications` when user-facing alerts, notification settings, notification channels, notification delivery semantics, announcement pushes, heartbeat pushes, target-channel pushes, or notification identity/metadata on pushed messages are central.
+
+- Do not add `notifications` merely because the title mentions message sending, outbound messages, or notification-like implementation names.
+ Add it only when the delivered alert/push/announcement/notification is the user-visible subject.
+
+- Use `chat_integrations` when Slack, Discord, ChatGPT, LinkedIn, chat adapters, or target-channel chat delivery behavior is central.
+ A title about Slack pushes or target-channel Slack delivery should include `chat_integrations`.
+
+- Do not add `chat_integrations` merely because the title says the user is chatting, actively chatting, or mentions a chat context.
+ Add it only when the integration/channel behavior itself is central.
+
+- Treat cron-related words as incidental when cron is only the delivery path for an announcement or push.
+ Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes`
+ should be `notifications`, `chat_integrations`, not `cron_automation` or `reliability`.
+
+- Use `cron_automation` when cron, scheduled jobs, scheduled announcements, recurring automation, or automated timed delivery is the central product surface, not merely the path where another feature failed.
+
+- Use `reliability` for central failures, crashes, errors, hardening against failures, guarded denial visibility, race conditions, takeover errors, silent breakage, or behavior that needs to be made robust.
+ Do not add `reliability` only because the title says `Bug`, `fix`, `dropped`, or mentions a previously fixed path. If the title is mainly about a featureās metadata/identity/channel behavior, classify that feature instead.
+
+- Use `security` for hardening, supply-chain boundaries, trust boundaries, credentials, permissions, isolation, SSRF, token verification, high/critical vulnerabilities, GitHub Actions supply-chain risk, or CI security posture.
+ Example: `ci: harden GitHub Actions supply-chain boundaries`
+ should be `security`.
+
+- Use `hooks` when hook behavior, hook priority, hook execution order, hook registration, or hook security is a central subject.
+ Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification`
+ should include `security`, `hooks`, and `skills_plugins`.
+
+- Use `skills_plugins` when user-installed plugins, skills, skill scanners, plugin inheritance, plugin availability, plugin loading, or plugin execution is central.
+
+- Do not add `local_model_providers` from unrelated scanner/provider/security wording unless local model provider behavior is explicitly central.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- Use `codex` when Codex itself, Codex CLI, Codex sandbox/runtime behavior, or Codex-specific compatibility is central.
+
+- Use `acp` when ACP is explicitly central.
+
+- Use `acpx` when ACPX is explicitly central, or when the title describes ACP session behavior involving per-binding/per-agent configuration that belongs to the ACPX integration surface.
+ Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions`
+ should be `acp`, `approvals`, `acpx`.
+
+- Use `approvals` when permission mode, approval mode, permission gating, or approval policy is central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, chat, message, CI, cron, bug, fix, or contract.
+Keep a topic only when it is a central maintainer-owned subject or a known domain contract explicitly requires it.
+Iteration 9: New subsample score 3.25 is better than old score 1.8928571428571428. Continue to full eval and add to candidate pool.
+Iteration 9: Valset score for new program: 0.5842551892551893 (coverage 30 / 30)
+Iteration 9: Val aggregate for new program: 0.5842551892551893
+Iteration 9: Individual valset scores for new program: {0: 1.0, 1: 0.25, 2: 0.25, 3: 1.0, 4: 0.16666666666666666, 5: 1.0, 6: 1.0, 7: 0.25, 8: 0.25, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 0.25, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 1.0, 19: 0.5, 20: 0.2, 21: 1.0, 22: 1.0, 23: 0.15384615384615385, 24: 0.5, 25: 0.5, 26: 0.2857142857142857, 27: 0.2, 28: 0.2, 29: 0.2857142857142857}
+Iteration 9: Objective aggregate scores for new program: {'weighted_score': 0.5842551892551892}
+Iteration 9: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 9: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 9: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 9: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8}, 1: {6}, 2: {0, 2}, 3: {0, 8, 2}, 4: {2, 6}, 5: {8, 2, 5, 6}, 6: {0, 1, 2, 3, 5, 6, 8}, 7: {0, 2, 4, 6}, 8: {0, 1, 2, 6}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8}, 15: {0, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {5}}
+Iteration 9: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 9: Best valset aggregate score so far: 0.7403968253968254
+Iteration 9: Best program as per aggregate score on valset: 2
+Iteration 9: Best score on valset: 0.7403968253968254
+Iteration 9: Linear pareto front program index: 2
+Iteration 9: New program candidate index: 8
+Iteration 10: Selected program 6 score: 0.6160651629072682
+Iteration 10: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, or feedback text.
+
+Input format:
+You may receive a target URL, title, body, summary, generated output, gold topics, or feedback. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Important feedback-derived rules:
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`, not `sessions`.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- In streaming/model-server titles, usage chunks are usually protocol fields, not telemetry/accounting.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving` only.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing. Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Example: `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `api_surface`.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.
+- Example: `[Bug]: Desktop cannot use existing Chrome sessions: EasyClaw Google sign-in fails, and user profile attach fails with spawn npx ENOENT` => `browser_automation`, `packaging_deployment`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Gateway, API surface, UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+
+Local model providers, open-weight models, model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, or OAuth. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 10: New subsample score 4.0 is better than old score 1.3214285714285714. Continue to full eval and add to candidate pool.
+Iteration 10: Valset score for new program: 0.5863186813186814 (coverage 30 / 30)
+Iteration 10: Val aggregate for new program: 0.5863186813186814
+Iteration 10: Individual valset scores for new program: {0: 0.5, 1: 0.5, 2: 0.14285714285714285, 3: 0.15384615384615385, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.3333333333333333, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.3333333333333333, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 0.25, 27: 0.5, 28: 0.5, 29: 1.0}
+Iteration 10: Objective aggregate scores for new program: {'weighted_score': 0.5863186813186813}
+Iteration 10: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 10: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 10: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 10: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8}, 1: {6}, 2: {0, 2}, 3: {0, 8, 2}, 4: {9, 2, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9}, 8: {0, 1, 2, 6, 9}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8, 9}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {9, 5}}
+Iteration 10: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 10: Best valset aggregate score so far: 0.7403968253968254
+Iteration 10: Best program as per aggregate score on valset: 2
+Iteration 10: Best score on valset: 0.7403968253968254
+Iteration 10: Linear pareto front program index: 2
+Iteration 10: New program candidate index: 9
+Iteration 11: Selected program 6 score: 0.6160651629072682
+Iteration 11: Proposed new text for routing_policy: Classify each GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Core task:
+Classify the GitHub item itself, not any previous assistant response, feedback text, labels, or generated output. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicitly central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary clearly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, environment name, release-note fragment, or incidental phrase.
+
+Composite titles:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify only the central user-visible items needed for maintainer routing. Do not add every topic hinted by every noun. Treat weak fragments in grab-bag titles as label spam unless they clearly describe a central behavior change.
+
+Important learned example:
+`fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth`
+=> `codex`, `chat_integrations`
+Do not add `auth_identity` here. Although `OAuth` appears, it is not central enough for routing; treat it as label spam unless the item is specifically about OAuth/auth/login/token behavior. Also do not add `skills_plugins` merely because `plugins` appears in a Codex startup phrase.
+
+Topic rules:
+
+Codex:
+Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, or Codex-specific runtime behavior.
+Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+Codex startup and Docker Codex setup are `codex`.
+
+Auth and identity:
+Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is the central subject.
+Do not add `auth_identity` merely because `OAuth` appears in a composite title, environment phrase, or product/setup fragment.
+Add `auth_identity` only when the item is specifically about auth behavior, restoration, account identity, credentials, login, or tokens.
+
+Chat integrations:
+Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+WhatsApp history is `chat_integrations`.
+
+Skills/plugins:
+Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is central.
+Do not add it merely because `plugins` appears as part of Codex startup wording.
+
+Gateway/API/UI:
+Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+Webchat full-message reader behavior is both `api_surface` and `ui_tui`; include `gateway` when gateway-backed or gateway-owned.
+
+Reliability:
+Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+Do not use `reliability` for every bug.
+
+Policy/config/security/MCP:
+Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+Use `security` for network policy, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+In policy/conformance titles, `model` usually means a policy area, not model providers or serving.
+
+Local models/providers/serving:
+Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+Use `model_serving` only when serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, base URL behavior, endpoint compatibility, request routing, or model-server compatibility is central.
+Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+
+ACP/ACPX/sessions/approvals:
+Use `acp` when ACP is named centrally.
+Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+Use `sessions` only when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+Do not add `sessions` merely because the title says `ACP sessions`.
+
+Documentation:
+Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+Documentation-only PRs usually return `docs` alone.
+
+Tools:
+Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+Use `mcp_tooling` for MCP-specific behavior.
+Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Other topics:
+Use `queueing` only when queue, queued execution, queue lifecycle, queue steering, scheduling behavior, or user-visible queued work is central.
+Use `notifications` when notification behavior itself is central.
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, or memory providers.
+Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or cron deadlock is central.
+Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+Use `agent_runtime` when runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership are central.
+Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, OAuth, or Docker. Keep a topic only if it is actually a central maintainer-owned subject.
+Iteration 11: New subsample score 3.5 is better than old score 3.2857142857142856. Continue to full eval and add to candidate pool.
+Iteration 11: Valset score for new program: 0.5487301587301587 (coverage 30 / 30)
+Iteration 11: Val aggregate for new program: 0.5487301587301587
+Iteration 11: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 1.0, 4: 1.0, 5: 0.5, 6: 0.5, 7: 0.5, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.5, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.5, 20: 0.25, 21: 0.5, 22: 0.25, 23: 0.14285714285714285, 24: 0.5, 25: 0.0, 26: 0.5, 27: 0.2, 28: 0.5, 29: 0.5}
+Iteration 11: Objective aggregate scores for new program: {'weighted_score': 0.5487301587301587}
+Iteration 11: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 11: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 11: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 11: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8}, 11: {0, 8, 2, 4}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9, 10}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {4}, 29: {9, 5}}
+Iteration 11: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 11: Best valset aggregate score so far: 0.7403968253968254
+Iteration 11: Best program as per aggregate score on valset: 2
+Iteration 11: Best score on valset: 0.7403968253968254
+Iteration 11: Linear pareto front program index: 2
+Iteration 11: New program candidate index: 10
+Iteration 12: Selected program 1 score: 0.585970695970696
+Iteration 12: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Input format:
+- target: GitHub issue or PR URL
+- title: issue/PR title
+- body: optional; read only if the title is ambiguous
+- labels, paths, examples, tests, implementation filenames, and incidental keywords may appear, but should not drive classification unless they describe the user-visible subject
+
+Output only JSON, or call `final_json` exactly once if available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Classification method:
+1. Read the title first.
+2. Use the body only if the title is ambiguous.
+3. Choose one primary topic.
+4. Add only explicit central secondary topics.
+5. Use three topics only when the title or known domain contract clearly requires three maintainer buckets.
+6. Remove any topic added only because of an incidental keyword.
+
+Domain-specific routing rules:
+
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`.
+ The PR type `test` does not make it tests-only.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, sandbox-governed execution, or permission boundaries are `sandboxing`.
+
+- Do not classify `sandboxing` merely because a title says āsandboxā when the sandbox is just the environment where another feature runs.
+ Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)`
+ should be `acpx`, `codex`, and `skills_plugins`, not `sandboxing`.
+ The central subject is ACPX/Codex inheriting user-installed plugins.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session behavior.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, and `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `sessions` when the issue is about session identity, lifecycle, takeover, routing, active chat ownership, persistent session state, or user-visible session conflicts.
+ Example: `cron announce delivery triggers EmbeddedAttemptSessionTakeoverError when user is actively chatting`
+ should include `sessions`, because `EmbeddedAttemptSessionTakeoverError` and āuser is actively chattingā indicate an active session takeover conflict.
+ It should also include `cron_automation` for cron announce delivery and `reliability` because an unexpected error is triggered.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- Named external/local model provider catalogs, provider-specific model listings, and provider adapter behavior are usually `local_model_providers`.
+ Example: `google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error`
+ should be `local_model_providers` and `reliability`.
+ Do not classify this as `model_serving`; the central subject is a provider catalog gap and silent failure.
+
+- Use `reliability` when the title describes silent failure, crashes, unexpected errors, broken delivery, missing error reporting, hangs, or behavior that fails instead of producing a clear user-facing outcome.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+ Do not stop at `local_models`; the Responses API part is central serving/protocol behavior.
+
+- Do not use `model_serving` merely because a title mentions a model name, provider name, provider catalog, or model availability. Use it only when serving protocol, endpoint compatibility, response structure, streaming, or OpenAI-compatible serving behavior is central.
+
+- Use `codex` when Codex itself, Codex-like behavior, Codex environment integration, or Codex-specific UX/runtime behavior is central.
+ In `ACPX Codex sandbox should inherit user-installed plugins`, `codex` is central.
+
+- Use `skills_plugins` when the issue is about plugins, installed plugins, inherited plugins, skills, plugin availability, or plugin runtime integration.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, sandbox, or contract. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 12: New subsample score 3.2857142857142856 is better than old score 1.95. Continue to full eval and add to candidate pool.
+Iteration 12: Valset score for new program: 0.4592735042735043 (coverage 30 / 30)
+Iteration 12: Val aggregate for new program: 0.4592735042735043
+Iteration 12: Individual valset scores for new program: {0: 0.5, 1: 0.25, 2: 0.25, 3: 0.15384615384615385, 4: 0.16666666666666666, 5: 0.5, 6: 0.2857142857142857, 7: 1.0, 8: 0.2, 9: 0.5, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 1.0, 18: 0.25, 19: 0.25, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.14285714285714285, 28: 1.0, 29: 0.15384615384615385}
+Iteration 12: Objective aggregate scores for new program: {'weighted_score': 0.4592735042735043}
+Iteration 12: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 0.5, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 12: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 12: Valset pareto front aggregate score: 0.9833333333333333
+Iteration 12: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6}, 10: {0, 1, 3, 4, 5, 8, 11}, 11: {0, 2, 4, 8, 11}, 12: {0, 1, 2, 4, 7}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {5}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7}, 27: {2}, 28: {11, 4}, 29: {9, 5}}
+Iteration 12: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 12: Best valset aggregate score so far: 0.7403968253968254
+Iteration 12: Best program as per aggregate score on valset: 2
+Iteration 12: Best score on valset: 0.7403968253968254
+Iteration 12: Linear pareto front program index: 2
+Iteration 12: New program candidate index: 11
+Iteration 13: Selected program 2 score: 0.7403968253968254
+Iteration 13: Proposed new text for routing_policy: Classify GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore labels, file paths, examples, tests, implementation files, target URL path, and incidental broad impact unless they describe the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only, or use `final_json` if available.
+
+Critical reliability rule:
+Use `reliability` whenever the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+
+Do not suppress `reliability` just because the deadlock/hang occurs inside an internal path such as `subagent`, `main lane`, `worker`, `before_prompt_build`, or another execution hook. Internal location words may be ignored for `queueing`, but the liveness failure itself is still central.
+
+Memory + reliability rule:
+Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+
+If an active-memory, recall, embedding, vector-store, or memory-provider item also describes a deadlock, hang, wedge, timeout, stuck state, or other liveness failure, classify it as both `memory` and `reliability`.
+
+Canonical example:
+Title: `fix #79026: active-memory recall subagent can deadlock on the main lane inside before_prompt_build`
+Correct output:
+{"topics_of_interest":["memory","reliability"]}
+
+Reason:
+`active-memory recall` makes `memory` central. `can deadlock` makes `reliability` central. `subagent`, `main lane`, and `before_prompt_build` are internal context and do not remove the reliability topic or add `queueing`.
+
+Queueing suppression:
+Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central. Words like `lane`, `main lane`, `worker`, `subagent`, or internal execution paths do not imply `queueing`.
+
+Final suppression check:
+Before outputting, remove any topic added only because of a related word appearing in the title. Keep a topic only if it is actually a central maintainer-owned subject. In particular, do not remove `reliability` when the title explicitly says `deadlock`, `hang`, `wedged`, `stuck`, `crash`, `race`, or `timeout`.
+Iteration 13: New subsample score 2.033333333333333 is not better than old score 3.5, skipping
+Iteration 14: Selected program 6 score: 0.6160651629072682
+Iteration 14: Proposed new text for routing_policy: Refine classification with these additional routing rules:
+
+- Treat UI-looking wording as label spam unless the UI itself is the maintainer-owned subject. A title like ārespect totalTokensFresh flag to avoid showing stale token countsā is about session state plus usage/token accounting, not `ui_tui`.
+ - Use `sessions` when freshness, lifecycle, state, or per-session values determine whether data is valid.
+ - Use `telemetry_usage` when token counts, usage counts, cost/usage display, or accounting/reporting values are central.
+ - Do not add `ui_tui` just because the bug is āshownā somewhere.
+
+- For memory embeddings backed by self-hosted OpenAI-compatible servers, classify the self-hosted inference path explicitly.
+ - āmemory/embeddingsā is `memory`.
+ - āopenai-compatible provider for self-hosted serversā, including llama.cpp, Ollama, vLLM, TGI, or LocalAI, is `self_hosted_inference`.
+ - Do not use `local_model_providers` merely because provider names or model-server names appear. Use `local_model_providers` only for provider catalogs, provider metadata, provider-specific model support, or provider catalog behavior.
+
+- For local Ollama or local embedding failures involving proxy/NO_PROXY behavior, prefer local/self-hosted routing over broad security or memory labels.
+ - āLocal Ollama embeddings fail when proxy is enabledā routes to `local_models` and `self_hosted_inference`.
+ - Do not add `memory` just because the word āembeddingsā appears, unless the memory subsystem, memory provider, vector store, or active-memory recall is central.
+ - Do not add `security` just because the title mentions SSRF defenses, proxy, or NO_PROXY. Use `security` only when the security policy, access restriction, credential boundary, vulnerability, or allowed/blocked network behavior is the central user-visible subject.
+
+- Apply a final ālabel spamā suppression pass:
+ - Remove `ui_tui` if the title only implies data is displayed.
+ - Remove `local_model_providers` if the title is about using self-hosted inference servers rather than catalog/provider metadata.
+ - Remove `memory` if embeddings are only the model capability involved and the memory system itself is not central.
+ - Remove `security` if SSRF/proxy wording is implementation context for local model connectivity rather than the central policy/security concern.
+
+Canonical examples:
+- `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`
+- `feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)` => `memory`, `self_hosted_inference`
+- `[Feature/Bug]: Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)` => `local_models`, `self_hosted_inference`
+Iteration 14: New subsample score 4.0 is better than old score 1.592857142857143. Continue to full eval and add to candidate pool.
+Iteration 14: Valset score for new program: 0.4111029711029711 (coverage 30 / 30)
+Iteration 14: Val aggregate for new program: 0.4111029711029711
+Iteration 14: Individual valset scores for new program: {0: 0.2857142857142857, 1: 0.2, 2: 0.25, 3: 0.2857142857142857, 4: 0.1111111111111111, 5: 0.25, 6: 0.2857142857142857, 7: 0.25, 8: 0.15384615384615385, 9: 1.0, 10: 0.2, 11: 0.2857142857142857, 12: 1.0, 13: 0.25, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 0.5, 22: 0.25, 23: 1.0, 24: 0.2857142857142857, 25: 0.25, 26: 1.0, 27: 0.1, 28: 0.5, 29: 0.15384615384615385}
+Iteration 14: Objective aggregate scores for new program: {'weighted_score': 0.41110297110297106}
+Iteration 14: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 14: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 14: Valset pareto front aggregate score: 1.0
+Iteration 14: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10}, 1: {10, 6}, 2: {0, 2}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10}, 9: {0, 1, 2, 3, 4, 5, 6, 12}, 10: {0, 1, 3, 4, 5, 8, 11}, 11: {0, 2, 4, 8, 11}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 15: {0, 9, 5, 6}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11}, 18: {0, 4, 5, 6, 8, 9, 10}, 19: {1, 3}, 20: {12}, 21: {0, 1, 2, 3, 6, 8}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6}, 26: {0, 1, 2, 3, 5, 7, 12}, 27: {2}, 28: {11, 4}, 29: {9, 5}}
+Iteration 14: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 14: Best valset aggregate score so far: 0.7403968253968254
+Iteration 14: Best program as per aggregate score on valset: 2
+Iteration 14: Best score on valset: 0.7403968253968254
+Iteration 14: Linear pareto front program index: 2
+Iteration 14: New program candidate index: 12
+Iteration 15: Selected program 2 score: 0.7403968253968254
+Iteration 15: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Goal:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore target URL path, labels, examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Composite title rule:
+If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+
+Important examples:
+- `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock`
+ => `exec_tools`, `browser_automation`, `cron_automation`
+ Do not add `ui_tui` just because browser diagnostics might be visible somewhere.
+ Do not add generic `reliability` when the deadlock is specifically a cron force-run/lifecycle bug owned by `cron_automation`.
+
+- `fix: LM Studio thinking blocks invisible with Responses API`
+ => `local_models`, `model_serving`
+ LM Studio is a local model provider/runtime signal.
+ Responses API behavior, response block handling, streaming, endpoint compatibility, or request/response protocol behavior is `model_serving`.
+ Do not add `ui_tui` merely because something is āinvisibleā unless webchat, TUI, UI views, or a UI reader is explicitly central.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+Topic rules:
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `approvals`, `acpx`
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` => `acp`, `sessions`
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Prefer a more specific owning topic when the title names a domain-specific lifecycle bug, such as `cron force-run deadlock` => `cron_automation`.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` => `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` => `auth_identity`, `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` => `mcp_tooling`, `tool_calling`.
+
+Open-weight, local providers, local models, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, response block handling, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- If a title combines a local provider/runtime with Responses API or OpenAI-compatible protocol behavior, include both `local_models` and `model_serving`.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- A cron force-run deadlock is `cron_automation`; do not add generic `reliability` unless reliability/self-healing is separately named as a central concern.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- `Pass outbound session identity into message_sending and surface guarded gateway send denial` => `gateway`, `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Do not add `ui_tui` merely because diagnostics are layered, output is invisible, or a result is user-visible. UI/TUI must be explicitly the owned surface.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- `layered browser diagnostics` is `browser_automation`.
+- Do not add `ui_tui` for browser diagnostics unless the title explicitly names UI/TUI/webchat display behavior.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, demo, invisible, diagnostics, or browser. Keep it only if that topic is actually a central maintainer-owned subject.
+Iteration 15: New subsample score 4.0 is better than old score 2.642857142857143. Continue to full eval and add to candidate pool.
+Iteration 15: Valset score for new program: 0.6716666666666666 (coverage 30 / 30)
+Iteration 15: Val aggregate for new program: 0.6716666666666666
+Iteration 15: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 1.0, 3: 0.2857142857142857, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 0.5, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 0.2857142857142857, 13: 0.25, 14: 1.0, 15: 1.0, 16: 0.25, 17: 1.0, 18: 1.0, 19: 0.2, 20: 0.2, 21: 1.0, 22: 0.0, 23: 0.14285714285714285, 24: 0.5, 25: 1.0, 26: 1.0, 27: 1.0, 28: 0.5, 29: 0.25}
+Iteration 15: Objective aggregate scores for new program: {'weighted_score': 0.6716666666666666}
+Iteration 15: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 15: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 15: Valset pareto front aggregate score: 1.0
+Iteration 15: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13}, 1: {10, 13, 6}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10, 13}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}, 15: {0, 5, 6, 9, 13}, 16: {0, 8, 2, 3}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13}, 18: {0, 4, 5, 6, 8, 9, 10, 13}, 19: {1, 3}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 15: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 15: Best valset aggregate score so far: 0.7403968253968254
+Iteration 15: Best program as per aggregate score on valset: 2
+Iteration 15: Best score on valset: 0.7403968253968254
+Iteration 15: Linear pareto front program index: 2
+Iteration 15: New program candidate index: 13
+Iteration 16: Selected program 1 score: 0.585970695970696
+Iteration 16: Proposed new text for routing_policy: Classify each GitHub issue or PR into the smallest complete set of maintainer topic ids.
+
+Output only JSON, or call `final_json` exactly once if that structured-output tool is available:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, coding_agents
+
+Read the title first. Use the body only if the title is ambiguous. Ignore labels, paths, tests, implementation filenames, examples, and incidental keywords unless they are the user-visible subject.
+
+Choose one primary topic, then add only explicit central secondary topics. Use three topics only when the title or domain contract clearly requires three maintainer buckets.
+
+General routing rules:
+- Treat bracketed prefixes such as `[codex]`, PR types such as `test(...)`, and labels as routing noise unless the actual user-visible subject is that topic.
+- Do not add `codex` merely because the title starts with `[codex]`.
+- Do not add local model topics merely because the title contains `model` in a policy, conformance, config, or security context.
+- Do not add provider/model topics from label spam or incidental words.
+
+Specific learned mappings:
+- `exec v2 contract follow-through` routes to all contract-owned areas:
+ `exec_tools`, `sandboxing`, and `approvals`.
+ Do this even when the title starts with `test(exec):`; the PR type `test` does not make it tests-only.
+ The exec v2 contract includes exec behavior, sandbox behavior, and approval behavior.
+
+- `sandbox.mode`, sandbox modes, sandbox policy, or sandbox-governed execution are `sandboxing`.
+
+- `sessions_spawn` is an internal name. Do not classify it as `sessions` unless the title is actually about session identity, lifecycle, routing, persistence, or user-visible session state.
+ Example: `Bug: sandbox.mode: "non-main" silently breaks sessions_spawn subagent initialization`
+ should be `coding_agents`, `sandboxing`, `agent_runtime`.
+ Here, `subagent initialization` is about coding-agent spawning/runtime startup, not user-visible sessions.
+
+- Use `coding_agents` when the central subject is subagents, coding agents, agent spawning, agent initialization, Claude Code/Codex-like agent process startup, or agent coordination behavior.
+
+- Use `agent_runtime` when the central subject is runtime startup, runtime ownership, process initialization, node-backed runtimes, agent execution runtimes, or a runtime failing to initialize.
+
+- LM Studio is a strong signal for `local_models`.
+
+- `Responses API` is a strong signal for `model_serving` when the bug is about response protocol behavior, endpoint compatibility, returned blocks, streaming lifecycle, or OpenAI-compatible serving behavior.
+ Example: `fix: LM Studio thinking blocks invisible with Responses API`
+ should be `model_serving` and `local_models`.
+
+- `Policy: add model, network, and MCP conformance checks` should be classified as:
+ `mcp_tooling`, `config`, `security`.
+ In this title, `model` is not `local_models` or `local_model_providers`; it is part of policy/conformance coverage. `network` and conformance/policy checks are security/config concerns.
+
+- `[codex] fix webchat full-message reader for truncated history` should be classified as:
+ `gateway`, `api_surface`, `ui_tui`.
+ Do not include `codex`; the prefix is label noise. `webchat` points to gateway/UI behavior, and `full-message reader` / `truncated history` is centrally about the API surface used to retrieve complete message content.
+
+Final suppression check:
+Remove any topic added only due to an incidental keyword such as test, path, label, session-like function name, model, provider, usage, gateway, plugin, tool, contract, or bracketed prefix. Keep the topic only when it is a central maintainer-owned subject or a known domain contract requires it.
+Iteration 16: New subsample score 4.0 is better than old score 1.25. Continue to full eval and add to candidate pool.
+Iteration 16: Valset score for new program: 0.530018315018315 (coverage 30 / 30)
+Iteration 16: Val aggregate for new program: 0.530018315018315
+Iteration 16: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.25, 3: 0.25, 4: 0.2857142857142857, 5: 0.5, 6: 1.0, 7: 0.25, 8: 1.0, 9: 0.5, 10: 1.0, 11: 0.25, 12: 0.2857142857142857, 13: 1.0, 14: 1.0, 15: 0.5, 16: 1.0, 17: 1.0, 18: 0.2857142857142857, 19: 1.0, 20: 0.2, 21: 0.5, 22: 0.25, 23: 0.15384615384615385, 24: 0.25, 25: 0.5, 26: 0.2857142857142857, 27: 0.0, 28: 0.25, 29: 0.15384615384615385}
+Iteration 16: Objective aggregate scores for new program: {'weighted_score': 0.530018315018315}
+Iteration 16: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 16: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 16: Valset pareto front aggregate score: 1.0
+Iteration 16: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {10, 13, 6, 14}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {9, 2, 10, 6}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14}, 7: {0, 2, 4, 6, 9, 11}, 8: {0, 1, 2, 6, 9, 10, 13, 14}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}, 15: {0, 5, 6, 9, 13}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5}, 24: {1, 9, 6, 7}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 16: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 16: Best valset aggregate score so far: 0.7403968253968254
+Iteration 16: Best program as per aggregate score on valset: 2
+Iteration 16: Best score on valset: 0.7403968253968254
+Iteration 16: Linear pareto front program index: 2
+Iteration 16: New program candidate index: 14
+Iteration 17: Selected program 9 score: 0.5863186813186814
+Iteration 17: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not labels, generated output, prior assistant responses, feedback text, or incidental words.
+
+Process:
+1. Read the title first. Treat it as the primary signal.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned concerns.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids.
+8. Output JSON only.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, or incidental phrase.
+
+Important domain rules:
+
+Local models / self-hosted inference / providers / security:
+- Use `local_models` when local model behavior is central, including local embeddings and local model runtime behavior.
+- Use `self_hosted_inference` when the issue concerns a locally hosted inference stack such as Ollama-backed inference or embeddings.
+- For titles like `Local Ollama embeddings fail when proxy is enabled (SSRF defenses ignore NO_PROXY)`, classify as `local_models` and `self_hosted_inference`.
+- Do not add `local_model_providers` merely because Ollama, Gemini, vLLM, llama.cpp, or another provider name appears. Use `local_model_providers` only when provider-specific catalog metadata, provider integration, known context windows, named provider support, or provider/model-family support is the central subject.
+- Do not add `security` merely because the title mentions SSRF, proxy, NO_PROXY, defenses, network, or policy language. Add `security` only when the central issue is actually security policy, access restriction, credential boundary, vulnerability, or allowed/blocked security behavior.
+
+Reliability / auth / sessions:
+- Use `reliability` when the central bug is a wedge, deadlock, hang, crash, race, liveness issue, stuck state, timeout, silent failure, self-healing behavior, or robustness failure.
+- Use `auth_identity` when OAuth, login, sign-in, tokens, authentication, account identity, credential identity, or identity restoration is central.
+- Use `sessions` when session identity, lifecycle, routing, persistent process identity, embedded session path, session-specific state, or embedded path behavior is central.
+- If OAuth restoration is tied to an embedded path, include both `auth_identity` and `sessions`.
+- For titles like `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path`, classify as `reliability`, `auth_identity`, and `sessions`.
+- Do not add `codex` merely because the title says `openai-codex` or starts with `[codex]`. Use `codex` only when Codex itself is the central product/runtime/setup being changed.
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+
+Gateway / sessions / API surface / UI:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Use `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`. If gateway-backed or gateway-owned, also include `gateway`.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the actual central failure is Chrome profile attach or missing packaged executable.
+
+Docs:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools / MCP / exec:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior, MCP conformance, MCP policy, MCP tool behavior, protocol checks, or MCP integrations.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, proxy, SSRF, NO_PROXY, Ollama, or provider. Keep the topic only if it is actually a central maintainer-owned subject.
+Iteration 17: New subsample score 3.5 is better than old score 2.642857142857143. Continue to full eval and add to candidate pool.
+Iteration 17: Valset score for new program: 0.5569841269841269 (coverage 30 / 30)
+Iteration 17: Val aggregate for new program: 0.5569841269841269
+Iteration 17: Individual valset scores for new program: {0: 0.25, 1: 0.2, 2: 0.14285714285714285, 3: 0.25, 4: 1.0, 5: 0.25, 6: 1.0, 7: 1.0, 8: 1.0, 9: 0.5, 10: 0.3333333333333333, 11: 0.25, 12: 0.25, 13: 0.5, 14: 1.0, 15: 1.0, 16: 0.25, 17: 0.25, 18: 1.0, 19: 0.3333333333333333, 20: 0.2, 21: 0.5, 22: 0.25, 23: 1.0, 24: 1.0, 25: 0.5, 26: 1.0, 27: 0.5, 28: 0.5, 29: 0.5}
+Iteration 17: Objective aggregate scores for new program: {'weighted_score': 0.5569841269841269}
+Iteration 17: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 17: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 17: Valset pareto front aggregate score: 1.0
+Iteration 17: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {10, 13, 6, 14}, 2: {0, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15}, 5: {2, 5, 6, 8, 9}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15}, 7: {0, 2, 4, 6, 9, 11, 15}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, 15: {0, 5, 6, 9, 13, 15}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15}, 27: {2, 13}, 28: {11, 4}, 29: {9, 5}}
+Iteration 17: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 17: Best valset aggregate score so far: 0.7403968253968254
+Iteration 17: Best program as per aggregate score on valset: 2
+Iteration 17: Best score on valset: 0.7403968253968254
+Iteration 17: Linear pareto front program index: 2
+Iteration 17: New program candidate index: 15
+Iteration 18: Selected program 9 score: 0.5863186813186814
+Iteration 18: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only, with no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not a previous assistant response, generated output, label, feedback text, or gold topic list.
+
+Input format:
+You may receive a target URL, title, body, summary, generated output, gold topics, feedback, score, or error. The title is the primary signal. Use the first clear body or summary only when the title is ambiguous. Ignore generated outputs, feedback, score, and error except when they explain how to improve the classification rule.
+
+Core process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, docs change, policy change, or contract change.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics when the title explicitly names three central facets.
+6. Use 0 topics only when no allowed topic is central.
+7. Never invent topic ids.
+8. Always emit the required JSON object. Never omit the final JSON.
+
+Important topic rules:
+
+ACP / ACPX / sessions:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- If a title says āper-bindingā, ābinding behaviorā, or ACPX-related binding, that is usually `acpx`.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+- Treat `sessions_yield` as label/path-like noise unless the title is truly about session identity, lifecycle, routing, persistent process identity, embedded session path, or session-specific state.
+- Example: `Explicit ACP agent runs: embedded backend visibility failure and stale final JSON state after sessions_yield` => `acp`, `acpx`.
+
+Approvals / permission modes:
+- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.
+- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.
+- Example: `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` => `acp`, `acpx`, `approvals`.
+
+Skills / plugins:
+- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.
+- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.
+- Example: `[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)` => `acpx`, `codex`, `skills_plugins`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+- Example: `Policy: add model, network, and MCP conformance checks` => `config`, `security`, `mcp_tooling`.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves central.
+- Example: `OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)` => `model_serving`.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.
+
+API surface / UI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Local model providers / open-weight models / model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a path, symptom, implementation detail, label prefix, test fixture, broad ownership area, example, or incidental phrase.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, or sandbox. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output:
+Return exactly one JSON object:
+{"topics_of_interest":[...]}
+Iteration 18: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 18: Valset score for new program: 0.6092857142857143 (coverage 30 / 30)
+Iteration 18: Val aggregate for new program: 0.6092857142857143
+Iteration 18: Individual valset scores for new program: {0: 0.25, 1: 1.0, 2: 1.0, 3: 0.25, 4: 1.0, 5: 1.0, 6: 1.0, 7: 0.25, 8: 1.0, 9: 0.5, 10: 0.2, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 0.5, 16: 0.25, 17: 0.5, 18: 1.0, 19: 0.2, 20: 0.2, 21: 0.5, 22: 0.5, 23: 0.14285714285714285, 24: 1.0, 25: 0.5, 26: 0.2857142857142857, 27: 0.5, 28: 0.5, 29: 1.0}
+Iteration 18: Objective aggregate scores for new program: {'weighted_score': 0.6092857142857142}
+Iteration 18: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 18: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 18: Valset pareto front aggregate score: 1.0
+Iteration 18: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14}, 1: {6, 10, 13, 14, 16}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16}, 5: {2, 5, 6, 8, 9, 16}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16}, 7: {0, 2, 4, 6, 9, 11, 15}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14, 16}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}, 15: {0, 5, 6, 9, 13, 15}, 16: {0, 2, 3, 8, 14}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15}, 27: {2, 13}, 28: {11, 4}, 29: {16, 9, 5}}
+Iteration 18: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 18: Best valset aggregate score so far: 0.7403968253968254
+Iteration 18: Best program as per aggregate score on valset: 2
+Iteration 18: Best score on valset: 0.7403968253968254
+Iteration 18: Linear pareto front program index: 2
+Iteration 18: New program candidate index: 16
+Iteration 19: Selected program 16 score: 0.6092857142857143
+Iteration 19: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+Return JSON only, with no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity, packaging_deployment
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern. Classify the GitHub item itself, not generated output, feedback, score, label text, or gold topic lists.
+
+Input handling:
+1. Read the title first. It is the primary signal.
+2. Use the first clear body or summary only if the title is ambiguous.
+3. Ignore generated outputs, score, error, and gold topics.
+4. Use feedback only as rule-learning context, not as the item to classify.
+5. Pick one primary topic.
+6. Add secondary topics only when they are explicit central maintainer-owned subjects.
+7. Use 3 topics only when the title explicitly names three central facets.
+8. Use 0 topics only when no allowed topic is central.
+9. Never invent topic ids.
+10. Always emit the required JSON object.
+
+General suppression rule:
+Do not add a topic just because a related word appears. Confirm the word is the actual subject, not a label prefix, path, symptom, implementation detail, test fixture, example, broad ownership area, or incidental phrase.
+
+ACP / ACPX / Agent runtime:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` only when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior, explicit ACP agent runs, embedded backend visibility, or ACP agent-run binding behavior.
+- Do not infer `acpx` from `ACP` alone.
+- Use `agent_runtime` when runtime execution, backend runtime, node-backed runtime, agent process/runtime behavior, or runtime ownership is central.
+- Example: `ACP: add gateway-owned node-backed runtime` => `acp`, `gateway`, `agent_runtime`.
+- If a title says `per-binding`, `binding behavior`, or ACPX-related binding, that is usually `acpx`.
+- Do not add `sessions` merely because the title mentions `sessions_yield`, `after sessions_yield`, stale final JSON after a yield, or `ACP sessions`.
+
+Approvals / permission modes:
+- Use `approvals` when approval policy, permission modes, `permissionMode`, approval gates, allow/deny behavior, or user permission decisions are central.
+- If ACP or ACPX work is specifically about per-agent or per-binding `permissionMode`, include `approvals`.
+
+Notifications / chat integrations / cron:
+- Use `notifications` when announces, alerts, heartbeat pushes, notification delivery, target channels, or notification routing are central.
+- Use `chat_integrations` when Slack, Discord, Teams, chat channels, chat pushes, or chat-specific identity/target-channel behavior is central.
+- Do not use `cron_automation` merely because a title mentions `cron`, `cron --announce`, scheduled heartbeat, or an announce path. Treat cron as label/path-like noise unless the core issue is scheduling, recurring jobs, cron execution, or automation timing itself.
+- Example: `Per-agent identity overlay dropped on cron --announce and heartbeat target-channel Slack pushes` => `notifications`, `chat_integrations`.
+
+Skills / plugins:
+- Use `skills_plugins` when plugins, skills, Superpowers, plugin inheritance, installed plugins, plugin availability, or skill/plugin runtime integration is central.
+- If ACPX/Codex sandbox behavior is about inheriting or exposing user-installed plugins, include `skills_plugins`.
+
+Codex:
+- Use `codex` only when Codex itself is the central product/runtime/setup being changed, such as Codex startup, Docker Codex setup, Codex sandbox behavior, or Codex-specific runtime behavior.
+- Do not add `codex` merely because the title starts with `[codex]`, contains `openai-codex`, or has Codex branding.
+
+Policy, config, security, MCP:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- In policy/conformance titles, the word `model` usually means a policy area, not model providers or serving.
+
+Streaming usage:
+- Use `model_serving` when OpenAI-compatible streaming, llama.cpp/vLLM/Ollama server behavior, Responses API compatibility, stream lifecycle, final chunks, base URL behavior, endpoint compatibility, or model-server request/response protocol behavior is central.
+- Do not add `telemetry_usage` merely because the title says `usage`, `usage chunk`, `zero usage`, or `final usage chunk` in an OpenAI-compatible streaming context.
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are central.
+
+Gateway / sessions / API surface:
+- Use `gateway` when gateway-owned behavior, gateway routing, gateway-backed readers, guarded gateway behavior, gateway send denial, or gateway ownership is central.
+- Use `sessions` when session identity, lifecycle, routing, state, persistent process identity, embedded session path, or session-specific behavior is central.
+- Passing outbound session identity is a central `sessions` concern.
+- Do not add `api_surface` just because a title mentions `message_sending`, `surface`, send denial, or plumbing.
+- Add `api_surface` only when the central subject is a public API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+
+Browser automation / packaging:
+- Use `browser_automation` for Chrome/session attach, browser diagnostics, browser automation layers, browser runtime behavior, browser tooling, profile attach, and existing browser session reuse.
+- Use `packaging_deployment` for desktop app packaging, install/startup/runtime packaging, bundled tool availability, missing executables, PATH issues, `spawn npx ENOENT`, npm/npx availability, launch failures caused by deployment layout, and packaged runtime dependency problems.
+- If a desktop browser automation flow fails because a packaged app cannot spawn `npx` or attach to a user profile, include both `browser_automation` and `packaging_deployment`.
+- Do not add `auth_identity` just because Google sign-in fails when the central failure is Chrome profile attach or missing packaged executable.
+
+API surface / UI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed integration interface, request/response shape, compatibility surface, or public contract.
+- Use `ui_tui` for webchat, TUI, terminal UI, UI views, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If the webchat reader is gateway-backed, gateway-owned, or about gateway history/reader behavior, include `gateway`.
+
+Local model providers / open-weight models / model serving:
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `open_weight_models` when open-weight models, model metadata, context windows, model catalogs, or open-weight compatibility are central.
+- Provider-specific open-weight model metadata usually needs both `local_model_providers` and `open_weight_models`.
+- Example: `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` => `open_weight_models`, `local_model_providers`.
+- Do not add `model_serving` merely because a title says model, provider, catalog, Gemini, Ollama, vLLM, llama.cpp, or OpenAI-compatible.
+- Provider catalog missing-model issues are `local_model_providers`; add `reliability` only when the failure mode is silent failure, crash, hang, deadlock, stuck state, timeout, or missing clear error.
+
+Reliability:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, silent failure, self-healing behavior, robustness failure, or failure to produce a clear error.
+- Do not use `reliability` for every bug.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or auth/session identity is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions`.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs usually return `docs` alone.
+
+Tools:
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, schemas, parameter coercion, or runtime behavior.
+- Use `mcp_tooling` for MCP-specific behavior.
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, sessions_yield, showcase, tutorial, demo, sign-in, OAuth, permission, sandbox, runtime, provider, or Slack. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Final output:
+Return exactly one JSON object:
+{"topics_of_interest":[...]}
+Iteration 19: New subsample score 4.0 is better than old score 2.0. Continue to full eval and add to candidate pool.
+Iteration 19: Valset score for new program: 0.6176678876678877 (coverage 30 / 30)
+Iteration 19: Val aggregate for new program: 0.6176678876678877
+Iteration 19: Individual valset scores for new program: {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.25, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 0.5, 9: 0.5, 10: 0.3333333333333333, 11: 1.0, 12: 0.25, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 0.5, 18: 1.0, 19: 0.5, 20: 0.2, 21: 0.5, 22: 0.15384615384615385, 23: 0.14285714285714285, 24: 1.0, 25: 0.25, 26: 1.0, 27: 0.0, 28: 0.2, 29: 0.25}
+Iteration 19: Objective aggregate scores for new program: {'weighted_score': 0.6176678876678875}
+Iteration 19: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 19: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 19: Valset pareto front aggregate score: 1.0
+Iteration 19: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14, 17}, 1: {6, 10, 13, 14, 16, 17}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16, 17}, 5: {2, 5, 6, 8, 9, 16, 17}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16, 17}, 7: {0, 2, 4, 6, 9, 11, 15, 17}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16, 17}, 12: {0, 1, 2, 4, 7, 12}, 13: {1, 2, 6, 9, 10, 11, 14, 16, 17}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}, 15: {0, 5, 6, 9, 13, 15, 17}, 16: {0, 2, 3, 8, 14, 17}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16, 17}, 19: {1, 3, 14}, 20: {12}, 21: {0, 1, 2, 3, 6, 8, 13}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16, 17}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15, 17}, 27: {2, 13}, 28: {11, 4}, 29: {16, 9, 5}}
+Iteration 19: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 19: Best valset aggregate score so far: 0.7403968253968254
+Iteration 19: Best program as per aggregate score on valset: 2
+Iteration 19: Best score on valset: 0.7403968253968254
+Iteration 19: Linear pareto front program index: 2
+Iteration 19: New program candidate index: 17
+Iteration 20: Selected program 2 score: 0.7403968253968254
+Iteration 20: Proposed new text for routing_policy: Additional classification rules learned from prior errors:
+
+Allowed topic ids must include `hooks` when hook behavior is part of the maintainer taxonomy.
+
+Token counts and usage freshness:
+- Titles about stale token counts, token usage counts, total token freshness flags, or usage display correctness are `telemetry_usage` when the user-visible subject is usage/count accuracy.
+- If the token count freshness is tied to session state, session identity, or per-session accounting, also include `sessions`.
+- Do not classify these as `ui_tui` merely because stale counts are shown in a UI. UI display is incidental unless the title is centrally about UI/TUI behavior.
+- Example: `fix: respect totalTokensFresh flag to avoid showing stale token counts` => `sessions`, `telemetry_usage`.
+
+Security bundles:
+- For composite security titles, classify each explicitly named central vulnerable subsystem, but do not over-expand from generic security words.
+- `token verification` in a vulnerability-fix title usually stays under `security`; do not add `auth_identity` unless authentication, OAuth, login, account identity, credential identity, or identity propagation is itself the central user-visible subject.
+- `skill scanner` or plugin/skill scanning vulnerability fixes should include `skills_plugins`.
+- `hook priority`, hook ordering, hook execution, or hook lifecycle should include `hooks` when hooks are an allowed topic.
+- `SSRF`, HIGH/CRITICAL vulnerabilities, network exploit prevention, and vulnerability remediation are `security`.
+- Example: `security: fix HIGH/CRITICAL vulns in skill scanner, SSRF, hook priority, and token verification` => `security`, `hooks`, `skills_plugins`, not `auth_identity`.
+
+Final suppression check additions:
+- Remove `ui_tui` if the UI is only where telemetry, tokens, or stale values appear.
+- Remove `auth_identity` if the only auth-like phrase is token verification inside a security vulnerability bundle.
+- Keep `hooks` when hook priority/order/behavior is explicitly named as one of the central fixes.
+Iteration 20: New subsample score 3.1538461538461537 is better than old score 2.45. Continue to full eval and add to candidate pool.
+Iteration 20: Valset score for new program: 0.4475383553644423 (coverage 30 / 30)
+Iteration 20: Val aggregate for new program: 0.4475383553644423
+Iteration 20: Individual valset scores for new program: {0: 0.15384615384615385, 1: 0.25, 2: 0.15384615384615385, 3: 0.16666666666666666, 4: 0.15384615384615385, 5: 0.25, 6: 0.2857142857142857, 7: 1.0, 8: 0.25, 9: 1.0, 10: 0.14285714285714285, 11: 0.2857142857142857, 12: 1.0, 13: 1.0, 14: 0.2857142857142857, 15: 0.5, 16: 0.15384615384615385, 17: 0.25, 18: 0.2857142857142857, 19: 0.2, 20: 1.0, 21: 1.0, 22: 0.2857142857142857, 23: 0.08695652173913043, 24: 1.0, 25: 0.25, 26: 0.2857142857142857, 27: 0.5, 28: 1.0, 29: 0.25}
+Iteration 20: Objective aggregate scores for new program: {'weighted_score': 0.4475383553644424}
+Iteration 20: New valset pareto front scores: {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0}
+Iteration 20: Objective pareto front scores: {'weighted_score': 0.7403968253968253}
+Iteration 20: Valset pareto front aggregate score: 1.0
+Iteration 20: Updated valset pareto front programs: {0: {0, 1, 2, 3, 5, 8, 10, 13, 14, 17}, 1: {6, 10, 13, 14, 16, 17}, 2: {0, 16, 2, 13}, 3: {0, 8, 2, 10}, 4: {2, 6, 9, 10, 15, 16, 17}, 5: {2, 5, 6, 8, 9, 16, 17}, 6: {0, 1, 2, 3, 5, 6, 8, 9, 13, 14, 15, 16, 17}, 7: {0, 2, 4, 6, 9, 11, 15, 17, 18}, 8: {0, 1, 2, 6, 9, 10, 13, 14, 15, 16}, 9: {0, 1, 2, 3, 4, 5, 6, 12, 13, 18}, 10: {0, 1, 3, 4, 5, 8, 11, 13, 14}, 11: {0, 2, 4, 8, 11, 13, 16, 17}, 12: {0, 1, 2, 4, 7, 12, 18}, 13: {1, 2, 6, 9, 10, 11, 14, 16, 17, 18}, 14: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}, 15: {0, 5, 6, 9, 13, 15, 17}, 16: {0, 2, 3, 8, 14, 17}, 17: {0, 1, 2, 3, 4, 5, 8, 11, 13, 14}, 18: {0, 4, 5, 6, 8, 9, 10, 13, 15, 16, 17}, 19: {1, 3, 14}, 20: {18, 12}, 21: {0, 1, 2, 3, 6, 8, 13, 18}, 22: {8}, 23: {12, 5, 15}, 24: {1, 6, 7, 9, 15, 16, 17, 18}, 25: {0, 2, 3, 4, 5, 6, 13}, 26: {0, 1, 2, 3, 5, 7, 12, 13, 15, 17}, 27: {2, 13}, 28: {18, 11, 4}, 29: {16, 9, 5}}
+Iteration 20: Updated objective pareto front programs: {'weighted_score': {2}}
+Iteration 20: Best valset aggregate score so far: 0.7403968253968254
+Iteration 20: Best program as per aggregate score on valset: 2
+Iteration 20: Best score on valset: 0.7403968253968254
+Iteration 20: Linear pareto front program index: 2
+Iteration 20: New program candidate index: 18
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log_stderr.txt b/gepa-12b-row30-prop20-continuation-20260614T021448Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-row30-prop20-continuation-20260614T021448Z/score_report.html b/gepa-12b-row30-prop20-continuation-20260614T021448Z/score_report.html
new file mode 100644
index 0000000000000000000000000000000000000000..8fc1fedd673cd1a44b88eb6f567c0b5e2415d5b5
--- /dev/null
+++ b/gepa-12b-row30-prop20-continuation-20260614T021448Z/score_report.html
@@ -0,0 +1,290 @@
+
+
+
+
+
+gepa-12b-row30-prop20-continuation-20260614T021448Z GEPA score report
+
+
+
+
+
Run directory: /home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T081715Z
+
Base score
n/a
+
Proposal attempts
0
+
Accepted full evals
0
+
Rejected proposals
0
+
Candidates
n/a
+
Best candidate
n/a
+
Metric calls
n/a
+
Validation Score Over Iterations
No iteration scores found yet.
+
Proposal Subsample Delta
No proposal deltas found yet.
+
Final Candidate Scores
Final GEPA result is not available yet.
+
Proposal Events
No proposal events found yet.
+
+
+
diff --git a/gepa-12b-six-20260612T190217Z/best-reeval-12b-train6.json b/gepa-12b-six-20260612T190217Z/best-reeval-12b-train6.json
new file mode 100644
index 0000000000000000000000000000000000000000..72de222d1da7a8a4a54872115325d5d98243b25a
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/best-reeval-12b-train6.json
@@ -0,0 +1,172 @@
+{
+ "candidate": "gepa-12b-six-best",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 0.5833333333333334,
+ "routing_policy_path": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z/best.routing_policy.md",
+ "routing_policy_sha256": "f4b161bb9bbaf366f1d4f1841243d73544bbd3c553ca6be5eb2818e757007187",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [
+ "agent_runtime"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "acp",
+ "gateway",
+ "agent_runtime"
+ ],
+ "id": "openclaw-openclaw-48940",
+ "loss": 1.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acp",
+ "gateway"
+ ],
+ "score": 0.5,
+ "target": "https://github.com/openclaw/openclaw/pull/48940",
+ "title": "ACP: add gateway-owned node-backed runtime",
+ "true_positives": [
+ "acp",
+ "gateway"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "mcp_tooling",
+ "config",
+ "security"
+ ],
+ "id": "openclaw-openclaw-80783",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "mcp_tooling",
+ "security",
+ "config"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/80783",
+ "title": "Policy: add model, network, and MCP conformance checks",
+ "true_positives": [
+ "mcp_tooling",
+ "security",
+ "config"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "browser_automation"
+ ],
+ "false_positives": [
+ "ui_tui"
+ ],
+ "gold_topics": [
+ "exec_tools",
+ "browser_automation",
+ "cron_automation"
+ ],
+ "id": "openclaw-openclaw-42027",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "exec_tools",
+ "cron_automation",
+ "ui_tui"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/pull/42027",
+ "title": "fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock",
+ "true_positives": [
+ "exec_tools",
+ "cron_automation"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "chat_integrations"
+ ],
+ "false_positives": [
+ "gateway"
+ ],
+ "gold_topics": [
+ "codex",
+ "chat_integrations"
+ ],
+ "id": "openclaw-openclaw-77748",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "codex",
+ "gateway"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/pull/77748",
+ "title": "fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth",
+ "true_positives": [
+ "codex"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "model_serving"
+ ],
+ "id": "openclaw-openclaw-79897",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "model_serving"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/79897",
+ "title": "OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)",
+ "true_positives": [
+ "model_serving"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "acpx"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "acp",
+ "approvals",
+ "acpx"
+ ],
+ "id": "openclaw-openclaw-40332",
+ "loss": 1.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acp",
+ "approvals"
+ ],
+ "score": 0.5,
+ "target": "https://github.com/openclaw/openclaw/issues/40332",
+ "title": "[Feature]: Per-binding and per-agent permissionMode for ACP sessions",
+ "true_positives": [
+ "acp",
+ "approvals"
+ ]
+ }
+ ],
+ "rows": 6,
+ "scores": [
+ 0.5,
+ 1.0,
+ 0.25,
+ 0.25,
+ 1.0,
+ 0.5
+ ]
+}
diff --git a/gepa-12b-six-20260612T190217Z/best.prompt.md b/gepa-12b-six-20260612T190217Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..5a6c8fc356573a36b99e70c54b797e67fee82827
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/best.prompt.md
@@ -0,0 +1,100 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, or policy change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Prefer the narrower central topic over a broad fallback.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-six-20260612T190217Z/best.routing_policy.md b/gepa-12b-six-20260612T190217Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..d232f43adcd336bae95a929d5a8fbdf0ad1fcda1
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/best.routing_policy.md
@@ -0,0 +1,45 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, or policy change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Prefer the narrower central topic over a broad fallback.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/candidate_tree.html b/gepa-12b-six-20260612T190217Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..e296251761b7e057815360b33ecbda5686fcf838
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/candidate_tree.html
@@ -0,0 +1,173 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-six-20260612T190217Z/candidates.json b/gepa-12b-six-20260612T190217Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..1a7f528eb6e44db11f3a110ff87d2d64c3b6e53f
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/candidates.json
@@ -0,0 +1,8 @@
+[
+ {
+ "routing_policy": "## Goal\n\nChoose the smallest topic set that would route this item to the right\nmaintainer interest bucket without hiding a central second concern.\n\nThis is a fuzzy multi-label task. Avoid both failure modes:\n\n- Over-labeling: adding topics for implementation details, tests, examples, or\n side effects.\n- Under-labeling: dropping a topic that is explicitly central to the title or\n first clear problem statement.\n\n## Decision Process\n\nDo these checks internally, then call `final_json`.\n\n1. Read the title first. Identify the main user-visible problem or feature.\n2. Pick the single best primary topic.\n3. Scan the title and first clear body summary for explicit central facets:\n protocol/runtime, provider/model serving, session/state, execution/tooling,\n channel/UI, memory, security/auth, config, reliability, docs/tests.\n4. Add a secondary topic only when removing it would misroute a maintainer who\n follows that bucket.\n5. Delete topics that are only file locations, validation work, examples,\n dependencies, likely consequences, or broad words from comments.\n6. Do a final cardinality check:\n - 0 topics when no allowed topic is central.\n - 1 topic for a single-focus item.\n - 2 topics for normal cross-topic items.\n - 3 topics when the title/body explicitly has three central facets.\n - 4+ topics only for explicit multi-system coordination, not broad impact.\n\nDo not optimize for one label. Optimize for the smallest complete set.\n\n## Runtime Discipline\n\nThis is a routing task, not a full PR review.\n\n- Do not audit every allowed topic.\n- Do not summarize the whole issue, comments, or diff.\n- Do not inspect repository files when the title and GitHub context are enough.\n- Stop once the primary topic and any explicit central secondary facets are\n identified.\n- Call `final_json` immediately after the final cardinality check.\n\n## Evidence Priority\n\n- Title is strongest.\n- The first clear problem statement or summary in the body is next.\n- Labels, changed files, comments, and diff can confirm or disambiguate, but\n they must not broaden the topic set by themselves.\n- Tests and docs files are weak evidence unless tests or docs are the subject.\n- If context is truncated, classify from the visible central evidence and add a\n caveat only when that materially limits confidence.\n\n## Central Facet Recovery\n\nBefore final output, check whether any of these central facets were dropped:\n\n- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway\n protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration\n behavior can be central as `agent_runtime` or `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, or\n parent/child state can be central as `sessions`.\n- Execution control, shell commands, tool allow/deny, durable exec, or tool\n invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or\n `tool_calling` depending on the exact behavior.\n- Local/self-hosted model backends, local hardware, OpenAI-compatible local\n serving, embeddings providers, and model endpoint behavior can be central as\n `local_models`, `local_model_providers`, `model_serving`, or\n `self_hosted_inference`.\n- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,\n notifications, and telemetry can be central when the title or summary makes\n them the subject.\n\nIf a facet is central but maps to a narrower allowed topic, use the narrower\ntopic instead of a broad fallback.\n\n## Enum Discipline\n\n- Output only exact allowed topic ids.\n- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,\n `thread`, `provider`, `tool`, or `test`.\n- If the closest word in the title is not allowed, map it to the nearest\n allowed topic or omit it.\n\n## Positive Cues\n\n- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status\n reporting route to `telemetry_usage`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,\n compaction, or agent orchestration route to `coding_agents`.\n- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,\n or runtime state route to `agent_runtime` when the agent runtime itself is\n central.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,\n local fallback, and local context behavior route to `local_models`.\n- OpenAI-compatible serving, base URL normalization for model endpoints,\n streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and\n request routing route to `model_serving`.\n- Self-hosted embeddings, memory providers, speech services, private inference,\n and locally operated inference services route to `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces\n route to `chat_integrations`; generic notify policy/delivery gates route to\n `notifications`.\n- Chat UI display/status/footer behavior routes to `ui_tui` only when the\n user-facing interface is central.\n- Test-only contract follow-through routes to `tests_ci` only when the testing\n surface itself is the subject; otherwise route by the runtime contract being\n tested.\n\n## Hard Local Model Filters\n\nBefore final output:\n\n- Delete `local_model_providers` unless the item explicitly centers a local,\n self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,\n vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private\n inference.\n- Never use `local_model_providers` for hosted provider catalogs/manifests,\n hosted model availability, hosted auth, usage/billing UI, TTS/speech/image\n behavior, or hosted APIs. Hosted catalog updates are `model_releases` and\n sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or\n `ui_tui`.\n- Use `local_models` only for concrete local/offline model execution or\n compatibility; not model IDs, catalogs, lists, static entries, provider\n manifests, or hosted provider availability. Keep it for local-model lean\n filtering and local-model runtime crashes.\n\n## Over-Label Guardrails\n\n- `api_surface`: external API, CLI, or HTTP contracts only. Not internal\n payloads/options/functions, status text, UI events, or ordinary command\n behavior.\n- `reliability`: operational failures such as timeout, crash, leak, retry,\n stuck state, data loss, cleanup, or recovery. Not a generic bug tag.\n- `sessions`: session lifecycle/state/storage/identity only. Not every item\n mentioning session context or files.\n- `local_model_providers`: provider setup/routing/auth/discovery/compatibility\n only. Not every local endpoint issue.\n- `config`: configuration behavior itself, not any feature with an option.\n- `docs` and `tests_ci`: only when docs or test tooling is the subject.\n- `security`: security boundary, credentials, auth hardening, SSRF, token\n handling, or vulnerability only. Not every risky-looking behavior.\n- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned\n routing only. Not every item that passes through the gateway.\n\n## Tie-Breakers\n\n- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features\n are `telemetry_usage`, even if shown in UI or session status.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request\n routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model\n endpoint compatibility are `model_serving`.\n- TTS, shell/exec, command, tool invocation, allowlist, and execution-control\n behavior are `exec_tools` when the feature controls execution or spoken/tool\n output.\n- Thread/session isolation, per-session binding, fallback recovery state, and\n lifecycle state are `sessions` when those boundaries are central.\n- Structured tool result display, stdout rendering for tool results, pre-tool\n text preservation, and tool-call transcript/content handling are\n `tool_calling` when tool-call semantics are central.\n- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate\n cleanup, and lifecycle recovery are `reliability` when recovery correctness is\n central.\n\n## False-Positive Suppression\n\n- Do not use `local_model_providers` for base URL normalization,\n OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,\n usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider\n setup/auth/discovery/routing is central.\n- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu\n behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or\n outbound recovery correctness. Use `chat_integrations` for named chat surfaces\n and `reliability` for recovery/loss/fallback.\n- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,\n generic tool output, or config-like options.\n- Do not use `api_surface` for parse helpers, CLI edge-case tests, token\n parsing, status/footer display, internal command behavior, or local model\n compatibility.\n- Do not use `config` merely because a feature adds an option. Route by what\n the option controls.\n\n"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..383ed1200c2bcb6db19acefaa2717a0ec1cf3458
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'sessions'), description='This PR implements a gateway-owned node-backed ACP runtime with durable state management for sessions and execution results.', caveats=('The PR includes some \"test-only\" fixes for specific chat integrations, but these are not the primary focus.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..05259a4c7d76a73689de792adb7f2aab1b03dcaf
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'model_serving', 'security'), description='This PR adds conformance checks for model providers, network SSRF protection, and MCP servers as part of a new policy enforcement layer.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..9bc8823696a0b5a0f85fa19c2de8d63ae47f7a9e
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'gateway', 'ui_tui'), description='This pull request fixes shell path resolution for exec tools, implements improved diagnostic reporting for the browser status command, and resolves a concurrency deadlock in the cron system by introducing a dedicated manual lane.', caveats=(\"The PR addresses three distinct functional areas: bash tool execution, gateway-level cron orchestration, and the CLI's browser status display.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..8d3980b602524a9289b95023308336cd1d156db4
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'chat_integrations', 'codex'), description='The PR addresses gateway startup logic for provider-based plugins (Codex), adds history support to the WhatsApp integration, and updates Docker infrastructure for Codex authentication.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..381ca3903ee97850d2ac2c22867e8edec1f62729
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('local_models', 'model_serving'), description='An issue with missing usage chunks in OpenAI-compatible streams from llama.cpp backends, affecting both local model integration and streaming logic.', caveats=('The issue specifically concerns a streaming protocol failure when using llama.cpp as an OpenAI-compatible backend, impacting usage tracking.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..e71ef89e8710f791c1a3ad7b625526af8662a0e2
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals'), description=\"This feature request addresses per-binding and per-agent runtime scoping for 'permissionMode' within ACP sessions to allow different access levels for different agents.\", caveats=(\"The issue focuses specifically on the acpx plugin's permission logic for ACP sessions.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/gepa-result.json b/gepa-12b-six-20260612T190217Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..f11b96838847daa3bdf1e218c599db858990a433
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/gepa-result.json
@@ -0,0 +1,89 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 1,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "## Goal\n\nChoose the smallest topic set that would route this item to the right\nmaintainer interest bucket without hiding a central second concern.\n\nThis is a fuzzy multi-label task. Avoid both failure modes:\n\n- Over-labeling: adding topics for implementation details, tests, examples, or\n side effects.\n- Under-labeling: dropping a topic that is explicitly central to the title or\n first clear problem statement.\n\n## Decision Process\n\nDo these checks internally, then call `final_json`.\n\n1. Read the title first. Identify the main user-visible problem or feature.\n2. Pick the single best primary topic.\n3. Scan the title and first clear body summary for explicit central facets:\n protocol/runtime, provider/model serving, session/state, execution/tooling,\n channel/UI, memory, security/auth, config, reliability, docs/tests.\n4. Add a secondary topic only when removing it would misroute a maintainer who\n follows that bucket.\n5. Delete topics that are only file locations, validation work, examples,\n dependencies, likely consequences, or broad words from comments.\n6. Do a final cardinality check:\n - 0 topics when no allowed topic is central.\n - 1 topic for a single-focus item.\n - 2 topics for normal cross-topic items.\n - 3 topics when the title/body explicitly has three central facets.\n - 4+ topics only for explicit multi-system coordination, not broad impact.\n\nDo not optimize for one label. Optimize for the smallest complete set.\n\n## Runtime Discipline\n\nThis is a routing task, not a full PR review.\n\n- Do not audit every allowed topic.\n- Do not summarize the whole issue, comments, or diff.\n- Do not inspect repository files when the title and GitHub context are enough.\n- Stop once the primary topic and any explicit central secondary facets are\n identified.\n- Call `final_json` immediately after the final cardinality check.\n\n## Evidence Priority\n\n- Title is strongest.\n- The first clear problem statement or summary in the body is next.\n- Labels, changed files, comments, and diff can confirm or disambiguate, but\n they must not broaden the topic set by themselves.\n- Tests and docs files are weak evidence unless tests or docs are the subject.\n- If context is truncated, classify from the visible central evidence and add a\n caveat only when that materially limits confidence.\n\n## Central Facet Recovery\n\nBefore final output, check whether any of these central facets were dropped:\n\n- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway\n protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration\n behavior can be central as `agent_runtime` or `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, or\n parent/child state can be central as `sessions`.\n- Execution control, shell commands, tool allow/deny, durable exec, or tool\n invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or\n `tool_calling` depending on the exact behavior.\n- Local/self-hosted model backends, local hardware, OpenAI-compatible local\n serving, embeddings providers, and model endpoint behavior can be central as\n `local_models`, `local_model_providers`, `model_serving`, or\n `self_hosted_inference`.\n- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,\n notifications, and telemetry can be central when the title or summary makes\n them the subject.\n\nIf a facet is central but maps to a narrower allowed topic, use the narrower\ntopic instead of a broad fallback.\n\n## Enum Discipline\n\n- Output only exact allowed topic ids.\n- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,\n `thread`, `provider`, `tool`, or `test`.\n- If the closest word in the title is not allowed, map it to the nearest\n allowed topic or omit it.\n\n## Positive Cues\n\n- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status\n reporting route to `telemetry_usage`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,\n compaction, or agent orchestration route to `coding_agents`.\n- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,\n or runtime state route to `agent_runtime` when the agent runtime itself is\n central.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,\n local fallback, and local context behavior route to `local_models`.\n- OpenAI-compatible serving, base URL normalization for model endpoints,\n streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and\n request routing route to `model_serving`.\n- Self-hosted embeddings, memory providers, speech services, private inference,\n and locally operated inference services route to `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces\n route to `chat_integrations`; generic notify policy/delivery gates route to\n `notifications`.\n- Chat UI display/status/footer behavior routes to `ui_tui` only when the\n user-facing interface is central.\n- Test-only contract follow-through routes to `tests_ci` only when the testing\n surface itself is the subject; otherwise route by the runtime contract being\n tested.\n\n## Hard Local Model Filters\n\nBefore final output:\n\n- Delete `local_model_providers` unless the item explicitly centers a local,\n self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,\n vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private\n inference.\n- Never use `local_model_providers` for hosted provider catalogs/manifests,\n hosted model availability, hosted auth, usage/billing UI, TTS/speech/image\n behavior, or hosted APIs. Hosted catalog updates are `model_releases` and\n sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or\n `ui_tui`.\n- Use `local_models` only for concrete local/offline model execution or\n compatibility; not model IDs, catalogs, lists, static entries, provider\n manifests, or hosted provider availability. Keep it for local-model lean\n filtering and local-model runtime crashes.\n\n## Over-Label Guardrails\n\n- `api_surface`: external API, CLI, or HTTP contracts only. Not internal\n payloads/options/functions, status text, UI events, or ordinary command\n behavior.\n- `reliability`: operational failures such as timeout, crash, leak, retry,\n stuck state, data loss, cleanup, or recovery. Not a generic bug tag.\n- `sessions`: session lifecycle/state/storage/identity only. Not every item\n mentioning session context or files.\n- `local_model_providers`: provider setup/routing/auth/discovery/compatibility\n only. Not every local endpoint issue.\n- `config`: configuration behavior itself, not any feature with an option.\n- `docs` and `tests_ci`: only when docs or test tooling is the subject.\n- `security`: security boundary, credentials, auth hardening, SSRF, token\n handling, or vulnerability only. Not every risky-looking behavior.\n- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned\n routing only. Not every item that passes through the gateway.\n\n## Tie-Breakers\n\n- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features\n are `telemetry_usage`, even if shown in UI or session status.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request\n routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model\n endpoint compatibility are `model_serving`.\n- TTS, shell/exec, command, tool invocation, allowlist, and execution-control\n behavior are `exec_tools` when the feature controls execution or spoken/tool\n output.\n- Thread/session isolation, per-session binding, fallback recovery state, and\n lifecycle state are `sessions` when those boundaries are central.\n- Structured tool result display, stdout rendering for tool results, pre-tool\n text preservation, and tool-call transcript/content handling are\n `tool_calling` when tool-call semantics are central.\n- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate\n cleanup, and lifecycle recovery are `reliability` when recovery correctness is\n central.\n\n## False-Positive Suppression\n\n- Do not use `local_model_providers` for base URL normalization,\n OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,\n usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider\n setup/auth/discovery/routing is central.\n- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu\n behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or\n outbound recovery correctness. Use `chat_integrations` for named chat surfaces\n and `reliability` for recovery/loss/fallback.\n- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,\n generic tool output, or config-like options.\n- Do not use `api_surface` for parse helpers, CLI edge-case tests, token\n parsing, status/footer display, internal command behavior, or local model\n compatibility.\n- Do not use `config` merely because a feature adds an option. Route by what\n the option controls.\n\n"
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 10
+ ],
+ "num_full_val_evals": 2,
+ "objective_pareto_front": {
+ "weighted_score": 0.5555555555555555
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 1
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 1
+ ],
+ "1": [
+ 1
+ ],
+ "2": [
+ 1
+ ],
+ "3": [
+ 0
+ ],
+ "4": [
+ 1
+ ],
+ "5": [
+ 0,
+ 1
+ ]
+ },
+ "run_dir": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z",
+ "seed": 0,
+ "total_metric_calls": 16,
+ "val_aggregate_scores": [
+ 0.2857142857142857,
+ 0.5555555555555556
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.2857142857142857
+ },
+ {
+ "weighted_score": 0.5555555555555555
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 0.25,
+ "1": 0.25,
+ "2": 0.14285714285714285,
+ "3": 0.2857142857142857,
+ "4": 0.2857142857142857,
+ "5": 0.5
+ },
+ {
+ "0": 0.3333333333333333,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/gepa_state.bin b/gepa-12b-six-20260612T190217Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..e9f3fab8845a4626d02445bd176376319ba4bf30
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3dc3cccf60cb252e8f757aa223b4c33e58a160fe275ca7fe51a7f7a193acef39
+size 13637
diff --git a/gepa-12b-six-20260612T190217Z/run_log.json b/gepa-12b-six-20260612T190217Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..403f6b2852dff3fdd2f7e40c39cfe1d85a4cad6e
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/run_log.json
@@ -0,0 +1,27 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 4,
+ 1
+ ],
+ "subsample_scores": [
+ 0.2857142857142857,
+ 0.25
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-six-20260612T190217Z/run_log.txt b/gepa-12b-six-20260612T190217Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c637acdd59a98c0b370b8662bdb06ac73105acc2
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/run_log.txt
@@ -0,0 +1,63 @@
+Iteration 0: Base program full valset score: 0.2857142857142857 over 6 / 6 examples
+Iteration 1: Selected program 0 score: 0.2857142857142857
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, or policy change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Prefer the narrower central topic over a broad fallback.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.
+Iteration 1: New subsample score 2.0 is better than old score 0.5357142857142857. Continue to full eval and add to candidate pool.
+Iteration 1: Found a better program on the valset with score 0.5555555555555556.
+Iteration 1: Valset score for new program: 0.5555555555555556 (coverage 6 / 6)
+Iteration 1: Val aggregate for new program: 0.5555555555555556
+Iteration 1: Individual valset scores for new program: {0: 0.3333333333333333, 1: 1.0, 2: 0.25, 3: 0.25, 4: 1.0, 5: 0.5}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.5555555555555555}
+Iteration 1: New valset pareto front scores: {0: 0.3333333333333333, 1: 1.0, 2: 0.25, 3: 0.2857142857142857, 4: 1.0, 5: 0.5}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.5555555555555555}
+Iteration 1: Valset pareto front aggregate score: 0.5615079365079365
+Iteration 1: Updated valset pareto front programs: {0: {1}, 1: {1}, 2: {1}, 3: {0}, 4: {1}, 5: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 1: Best valset aggregate score so far: 0.5555555555555556
+Iteration 1: Best program as per aggregate score on valset: 1
+Iteration 1: Best score on valset: 0.5555555555555556
+Iteration 1: Linear pareto front program index: 1
+Iteration 1: New program candidate index: 1
diff --git a/gepa-12b-six-20260612T190217Z/run_log_stderr.txt b/gepa-12b-six-20260612T190217Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-six-20260612T190217Z/summary.json b/gepa-12b-six-20260612T190217Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..581bbf38e552961976835e4ea4cee0def9cf7f1a
--- /dev/null
+++ b/gepa-12b-six-20260612T190217Z/summary.json
@@ -0,0 +1,27 @@
+{
+ "best_idx": 1,
+ "best_prompt_path": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z/best.routing_policy.md",
+ "best_score": 0.5555555555555556,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 300000
+ },
+ "max_metric_calls": 8,
+ "output_dir": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z",
+ "reflection_minibatch_size": 2,
+ "row_limit": 6,
+ "seed": 0
+ },
+ "created_at": "2026-06-12T19:12:11.598916+00:00",
+ "num_candidates": 2,
+ "num_full_val_evals": 2,
+ "result_path": "prompt-optimizer/out/gepa-12b-six-20260612T190217Z/gepa-result.json",
+ "total_metric_calls": 16
+}
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/best.prompt.md b/gepa-12b-twelve-from-six-20260612T192148Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..5a6c8fc356573a36b99e70c54b797e67fee82827
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/best.prompt.md
@@ -0,0 +1,100 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, or policy change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Prefer the narrower central topic over a broad fallback.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/best.routing_policy.md b/gepa-12b-twelve-from-six-20260612T192148Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..d232f43adcd336bae95a929d5a8fbdf0ad1fcda1
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/best.routing_policy.md
@@ -0,0 +1,45 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, or policy change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Prefer the narrower central topic over a broad fallback.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/candidates.json b/gepa-12b-twelve-from-six-20260612T192148Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..6970b4689d366dad63bb29a23f192d6323f70eb2
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/candidates.json
@@ -0,0 +1,5 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ec42ccb8d112af4a0c9dfcd7d0c386ee18039f93
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway'), description='This PR implements a durable, gateway-owned ACP runtime to improve session recovery and reliability by moving state management from nodes to the gateway.', caveats=('The PR involves a significant architectural shift in how the gateway handles ACP state and worker communication, making both acp and gateway essential for routing.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..37068c4327f7249bd6d4499877f1867d6eb92fb0
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'security', 'local_model_providers'), description='This PR introduces conformance checks for model providers, network security (SSRF), and MCP servers within the policy system.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2da21303baa1629c49d2d294b5e338ebf0ce724b
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'security'), description=\"This PR implements contract improvements and security enforcement for the 'exec' system, focusing on command allowlisting and safe binary verification.\", caveats=('The PR specifically addresses shell execution contracts and security hardening for safe binaries.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7e6cead1523c4f08f363164e356e88ebbb2ec752
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acpx', 'skills_plugins', 'sandboxing'), description='This feature request addresses plugin inheritance within the ACPX (Codex) environment, impacting both skill/plugin integration and sandbox isolation policies.', caveats=('The issue involves a security-sensitive change to execution boundaries as noted in the reviewer comments.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..a18deb98538cbd451a1a463394ab99be8379d3ae
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'gateway', 'ui_tui'), description='This PR fixes critical bugs in shell execution paths, gateway concurrency logic for cron jobs, and improves UI/CLI diagnostics for browser status.', caveats=('The PR addresses three distinct issues: tool execution path resolution, gateway-level cron lane management, and enhanced diagnostic reporting for browser tools.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..0019ddc436783fdcf3317129037922fcf5a9c063
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=(), description='', caveats=(), error='classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260612T192252Z-3488012.md\\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260612T192252Z-3488012.json\\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260612T192252Z-3488012\\nlocalpager-agent: final_json was not called; no structured output was captured')"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7f8dfd0139a325794a4e70f79953822f153e76b6
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description='The issue describes a failure to process final usage chunks in OpenAI-compatible streams from llama.cpp backends, which falls under model serving protocol handling.', caveats=('The report specifically notes that usage data is lost because of how the system handles the SSE stream lifecycle for llama.cpp backends.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..13cddabf76e96600751232db9a3d5c7c8f000655
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals'), description='The request implements per-binding and per-agent scoped permission modes for ACP sessions, directly affecting both the ACP protocol integration and the core approval/permission logic.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..d96d92a97e81535c7b664583d0263557d4bbf551
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'hooks'), description=\"This PR enhances the gateway's message sending path by including session identity in hook contexts to support guard-based decisions.\", caveats=('The issue specifically addresses the integration between gateway logic and hook execution for guarded messages.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..6c48c955a43a77cd4e8c9b9e2e4d483b4aa79e71
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory',), description='This pull request fixes a deadlock in the active_memory recall subagent by isolating it to its own lane.', caveats=('The issue specifically targets the active-memory extension and its internal subagent routing/deadlock resolution.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..8e825879c0e4d53d3880b23fcf7925d629d6028b
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('api_surface', 'ui_tui'), description='This PR implements a new gateway RPC to fetch full message content for the UI sidebar, impacting both the API surface and the web interface.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..6072720d34e46ddd212e25fea148ec0476c7dc20
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description=\"The pull request updates documentation regarding 'steer' behavior within the queue system, specifically how it interacts with streaming modes and tool boundaries.\", caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/gepa-result.json b/gepa-12b-twelve-from-six-20260612T192148Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..db564565f4e42c75d239195087f26db5225877fc
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/gepa-result.json
@@ -0,0 +1,93 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 0,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0
+ ],
+ "num_full_val_evals": 1,
+ "objective_pareto_front": {
+ "weighted_score": 0.4244047619047619
+ },
+ "parents": [
+ [
+ null
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 0
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0
+ ],
+ "1": [
+ 0
+ ],
+ "2": [
+ 0
+ ],
+ "3": [
+ 0
+ ],
+ "4": [
+ 0
+ ],
+ "5": [
+ 0
+ ],
+ "6": [
+ 0
+ ],
+ "7": [
+ 0
+ ],
+ "8": [
+ 0
+ ],
+ "9": [
+ 0
+ ],
+ "10": [
+ 0
+ ],
+ "11": [
+ 0
+ ]
+ },
+ "run_dir": "prompt-optimizer/out/gepa-12b-twelve-from-six-20260612T192148Z",
+ "seed": 0,
+ "total_metric_calls": 12,
+ "val_aggregate_scores": [
+ 0.4244047619047619
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.4244047619047619
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 0.5,
+ "1": 0.25,
+ "2": 0.14285714285714285,
+ "3": 0.0,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 0.25,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/gepa_state.bin b/gepa-12b-twelve-from-six-20260612T192148Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..7758755d1ce54af80c5ba8234f202f7b9644cd67
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:24ba88e8307ab4cf6ff14e4e38b93e9a59f7ee327d257c651d024225c19f38b5
+size 4365
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/run_log.txt b/gepa-12b-twelve-from-six-20260612T192148Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..37e26bb58f86522265770612f09a14b9c272ae94
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/run_log.txt
@@ -0,0 +1 @@
+Iteration 0: Base program full valset score: 0.4244047619047619 over 12 / 12 examples
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/run_log_stderr.txt b/gepa-12b-twelve-from-six-20260612T192148Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-twelve-from-six-20260612T192148Z/summary.json b/gepa-12b-twelve-from-six-20260612T192148Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..e2eef596e9c145ff8f5c05524a095dfe0d92c57b
--- /dev/null
+++ b/gepa-12b-twelve-from-six-20260612T192148Z/summary.json
@@ -0,0 +1,29 @@
+{
+ "best_idx": 0,
+ "best_prompt_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-20260612T192148Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-20260612T192148Z/best.routing_policy.md",
+ "best_score": 0.4244047619047619,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 300000
+ },
+ "max_metric_calls": 12,
+ "output_dir": "prompt-optimizer/out/gepa-12b-twelve-from-six-20260612T192148Z",
+ "reflection_minibatch_size": 3,
+ "row_limit": 12,
+ "seed": 0,
+ "seed_routing_policy_chars": 3224,
+ "seed_routing_policy_sha256": "f4b161bb9bbaf366f1d4f1841243d73544bbd3c553ca6be5eb2818e757007187"
+ },
+ "created_at": "2026-06-12T19:27:57.574595+00:00",
+ "num_candidates": 1,
+ "num_full_val_evals": 1,
+ "result_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-20260612T192148Z/gepa-result.json",
+ "total_metric_calls": 12
+}
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/best-reeval-12b-train12.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best-reeval-12b-train12.json
new file mode 100644
index 0000000000000000000000000000000000000000..651add3a5c430bbbe05aac5ffddb3c70c5f4849e
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best-reeval-12b-train12.json
@@ -0,0 +1,324 @@
+{
+ "candidate": "gepa-12b-twelve-best",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 0.5157967032967034,
+ "offset": 0,
+ "routing_policy_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.routing_policy.md",
+ "routing_policy_sha256": "6ab4227828618436d7f81662b5cc4993fb5b30557e3e56616801dbec6d2da34a",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [
+ "agent_runtime"
+ ],
+ "false_positives": [
+ "acpx"
+ ],
+ "gold_topics": [
+ "acp",
+ "gateway",
+ "agent_runtime"
+ ],
+ "id": "openclaw-openclaw-48940",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acp",
+ "acpx",
+ "gateway"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/pull/48940",
+ "title": "ACP: add gateway-owned node-backed runtime",
+ "true_positives": [
+ "acp",
+ "gateway"
+ ]
+ },
+ {
+ "error": "classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260612T195003Z-3514498.md\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260612T195003Z-3514498.json\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260612T195003Z-3514498\nlocalpager-agent: final_json was not called; no structured output was captured",
+ "gold_topics": [
+ "mcp_tooling",
+ "config",
+ "security"
+ ],
+ "id": "openclaw-openclaw-80783",
+ "predicted_topics": [],
+ "score": 0.0,
+ "target": "https://github.com/openclaw/openclaw/pull/80783",
+ "title": "Policy: add model, network, and MCP conformance checks"
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "browser_automation"
+ ],
+ "false_positives": [
+ "ui_tui",
+ "gateway"
+ ],
+ "gold_topics": [
+ "exec_tools",
+ "browser_automation",
+ "cron_automation"
+ ],
+ "id": "openclaw-openclaw-42027",
+ "loss": 5.5,
+ "over_label_count": 1,
+ "predicted_topics": [
+ "exec_tools",
+ "ui_tui",
+ "cron_automation",
+ "gateway"
+ ],
+ "score": 0.15384615384615385,
+ "target": "https://github.com/openclaw/openclaw/pull/42027",
+ "title": "fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock",
+ "true_positives": [
+ "exec_tools",
+ "cron_automation"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "chat_integrations"
+ ],
+ "false_positives": [
+ "gateway"
+ ],
+ "gold_topics": [
+ "codex",
+ "chat_integrations"
+ ],
+ "id": "openclaw-openclaw-77748",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "codex",
+ "gateway"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/pull/77748",
+ "title": "fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth",
+ "true_positives": [
+ "codex"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "model_serving"
+ ],
+ "id": "openclaw-openclaw-79897",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "model_serving"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/79897",
+ "title": "OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)",
+ "true_positives": [
+ "model_serving"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "acp",
+ "approvals",
+ "acpx"
+ ],
+ "id": "openclaw-openclaw-40332",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acp",
+ "approvals",
+ "acpx"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/40332",
+ "title": "[Feature]: Per-binding and per-agent permissionMode for ACP sessions",
+ "true_positives": [
+ "acp",
+ "approvals",
+ "acpx"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [
+ "approvals"
+ ],
+ "gold_topics": [
+ "gateway",
+ "sessions"
+ ],
+ "id": "openclaw-openclaw-63007",
+ "loss": 2.5,
+ "over_label_count": 1,
+ "predicted_topics": [
+ "gateway",
+ "sessions",
+ "approvals"
+ ],
+ "score": 0.2857142857142857,
+ "target": "https://github.com/openclaw/openclaw/pull/63007",
+ "title": "Pass outbound session identity into message_sending and surface guarded gateway send denial",
+ "true_positives": [
+ "gateway",
+ "sessions"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "reliability"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "memory",
+ "reliability"
+ ],
+ "id": "openclaw-openclaw-80255",
+ "loss": 1.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "memory"
+ ],
+ "score": 0.5,
+ "target": "https://github.com/openclaw/openclaw/pull/80255",
+ "title": "fix #79026: active-memory recall subagent can deadlock on the main lane inside before_prompt_build",
+ "true_positives": [
+ "memory"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "gateway"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "gateway",
+ "api_surface",
+ "ui_tui"
+ ],
+ "id": "openclaw-openclaw-84670",
+ "loss": 1.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "api_surface",
+ "ui_tui"
+ ],
+ "score": 0.5,
+ "target": "https://github.com/openclaw/openclaw/pull/84670",
+ "title": "[codex] fix webchat full-message reader for truncated history",
+ "true_positives": [
+ "api_surface",
+ "ui_tui"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "queueing",
+ "docs"
+ ],
+ "id": "openclaw-openclaw-46552",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "docs",
+ "queueing"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/46552",
+ "title": "docs(queue): clarify steer behavior with partial streaming and tool boundaries",
+ "true_positives": [
+ "docs",
+ "queueing"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "exec_tools",
+ "sandboxing",
+ "approvals"
+ ],
+ "id": "openclaw-openclaw-62428",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "exec_tools",
+ "approvals",
+ "sandboxing"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/62428",
+ "title": "test(exec): land exec v2 contract follow-through",
+ "true_positives": [
+ "exec_tools",
+ "approvals",
+ "sandboxing"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "codex"
+ ],
+ "false_positives": [
+ "sandboxing"
+ ],
+ "gold_topics": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ],
+ "id": "openclaw-openclaw-82507",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acpx",
+ "sandboxing",
+ "skills_plugins"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/issues/82507",
+ "title": "[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)",
+ "true_positives": [
+ "acpx",
+ "skills_plugins"
+ ]
+ }
+ ],
+ "rows": 12,
+ "scores": [
+ 0.25,
+ 0.0,
+ 0.15384615384615385,
+ 0.25,
+ 1.0,
+ 1.0,
+ 0.2857142857142857,
+ 0.5,
+ 0.5,
+ 1.0,
+ 1.0,
+ 0.25
+ ]
+}
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.prompt.md b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..323a7f474fdeb4e0c4a85667da199138399e71a8
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.prompt.md
@@ -0,0 +1,116 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second or third concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, contract, policy, or behavior change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add secondary topics only when they are explicitly central and removing them would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids as JSON.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Exec / sandbox / approval rules:
+
+- Exec command tools, shell execution behavior, exec protocol contracts, exec v2, and exec tool test contracts are `exec_tools`.
+- Sandbox modes, sandbox policy, filesystem/process isolation, sandbox enforcement, or exec behavior under sandbox constraints are `sandboxing`.
+- Approval prompts, permission gates, escalation decisions, permissionMode behavior, or user consent flows are `approvals`.
+- Do not replace `sandboxing` or `approvals` with `security` just because the behavior is security-adjacent.
+- Use `security` only when the item is centrally about security policy, vulnerabilities, secrets, access control, network boundaries, or allowed/blocked behavior as a security concern.
+- A title such as ātest(exec): land exec v2 contract follow-throughā is not merely a test-only item. It centrally concerns the exec v2 contract and should include the central contract facets, for example `exec_tools`, `sandboxing`, and `approvals` when those are part of the contract.
+
+ACP / ACPX rules:
+
+- ACP protocol/session behavior is `acp`.
+- ACPX session orchestration, per-agent behavior, bindings, agent/session integration, or ACP extension-layer behavior is `acpx`.
+- PermissionMode or permission policy for ACP sessions is also `approvals`.
+- A title such as ā[Feature]: Per-binding and per-agent permissionMode for ACP sessionsā should include `acp`, `approvals`, and `acpx`: ACP is the protocol/session surface, approvals is the permissionMode concern, and ACPX is central because per-binding/per-agent session behavior routes to ACPX maintainers.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics when the title or first clear summary explicitly has three central facets, such as exec + sandboxing + approvals or ACP + approvals + ACPX.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, āsecurityā, āpermissionā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Do not use broad fallback topics when a narrower central topic exists.
+- Do not use `security` as a generic substitute for `sandboxing` or `approvals`.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.routing_policy.md b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..aebaace32734802220b65a663b38f2b942ba582c
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.routing_policy.md
@@ -0,0 +1,61 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second or third concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, contract, policy, or behavior change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add secondary topics only when they are explicitly central and removing them would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids as JSON.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Exec / sandbox / approval rules:
+
+- Exec command tools, shell execution behavior, exec protocol contracts, exec v2, and exec tool test contracts are `exec_tools`.
+- Sandbox modes, sandbox policy, filesystem/process isolation, sandbox enforcement, or exec behavior under sandbox constraints are `sandboxing`.
+- Approval prompts, permission gates, escalation decisions, permissionMode behavior, or user consent flows are `approvals`.
+- Do not replace `sandboxing` or `approvals` with `security` just because the behavior is security-adjacent.
+- Use `security` only when the item is centrally about security policy, vulnerabilities, secrets, access control, network boundaries, or allowed/blocked behavior as a security concern.
+- A title such as ātest(exec): land exec v2 contract follow-throughā is not merely a test-only item. It centrally concerns the exec v2 contract and should include the central contract facets, for example `exec_tools`, `sandboxing`, and `approvals` when those are part of the contract.
+
+ACP / ACPX rules:
+
+- ACP protocol/session behavior is `acp`.
+- ACPX session orchestration, per-agent behavior, bindings, agent/session integration, or ACP extension-layer behavior is `acpx`.
+- PermissionMode or permission policy for ACP sessions is also `approvals`.
+- A title such as ā[Feature]: Per-binding and per-agent permissionMode for ACP sessionsā should include `acp`, `approvals`, and `acpx`: ACP is the protocol/session surface, approvals is the permissionMode concern, and ACPX is central because per-binding/per-agent session behavior routes to ACPX maintainers.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics when the title or first clear summary explicitly has three central facets, such as exec + sandboxing + approvals or ACP + approvals + ACPX.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, āsecurityā, āpermissionā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Do not use broad fallback topics when a narrower central topic exists.
+- Do not use `security` as a generic substitute for `sandboxing` or `approvals`.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidate_tree.html b/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..e7e47b83646df874e75b85d6417f165c497ab9b2
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidate_tree.html
@@ -0,0 +1,173 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidates.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..1b7b690552abfe54190df7dfb336a2041fd58044
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/candidates.json
@@ -0,0 +1,8 @@
+[
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second or third concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, contract, policy, or behavior change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add secondary topics only when they are explicitly central and removing them would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids as JSON.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nExec / sandbox / approval rules:\n\n- Exec command tools, shell execution behavior, exec protocol contracts, exec v2, and exec tool test contracts are `exec_tools`.\n- Sandbox modes, sandbox policy, filesystem/process isolation, sandbox enforcement, or exec behavior under sandbox constraints are `sandboxing`.\n- Approval prompts, permission gates, escalation decisions, permissionMode behavior, or user consent flows are `approvals`.\n- Do not replace `sandboxing` or `approvals` with `security` just because the behavior is security-adjacent.\n- Use `security` only when the item is centrally about security policy, vulnerabilities, secrets, access control, network boundaries, or allowed/blocked behavior as a security concern.\n- A title such as \u201ctest(exec): land exec v2 contract follow-through\u201d is not merely a test-only item. It centrally concerns the exec v2 contract and should include the central contract facets, for example `exec_tools`, `sandboxing`, and `approvals` when those are part of the contract.\n\nACP / ACPX rules:\n\n- ACP protocol/session behavior is `acp`.\n- ACPX session orchestration, per-agent behavior, bindings, agent/session integration, or ACP extension-layer behavior is `acpx`.\n- PermissionMode or permission policy for ACP sessions is also `approvals`.\n- A title such as \u201c[Feature]: Per-binding and per-agent permissionMode for ACP sessions\u201d should include `acp`, `approvals`, and `acpx`: ACP is the protocol/session surface, approvals is the permissionMode concern, and ACPX is central because per-binding/per-agent session behavior routes to ACPX maintainers.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics when the title or first clear summary explicitly has three central facets, such as exec + sandboxing + approvals or ACP + approvals + ACPX.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201csecurity\u201d, \u201cpermission\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Do not use broad fallback topics when a narrower central topic exists.\n- Do not use `security` as a generic substitute for `sandboxing` or `approvals`.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..2fbf0a05acc2f53d0f086e24c51e2eac9a759365
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway'), description=\"This PR implements a durable, gateway-owned backend for the ACP protocol, involving significant changes to both the ACP system and the Gateway's management of sessions and node execution.\", caveats=('The PR includes some minor test-only fixes for external integrations (Feishu/Google), but these are not the core focus.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7548c256373801cc098dbf96ac1c8dbf00659f82
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('mcp_tooling', 'security'), description='The PR adds conformance checks for MCP servers and network SSRF settings within a policy plugin.', caveats=('The \"model\" portion of the policy update is related to provider selection but does not meet the specific criteria for model_serving or local_models based on the provided rules.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ae42f3213007956c45e334f048dc4de3c7ae5aab
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_10/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'security'), description=\"This PR implements contract hardening and security audits for the 'exec' tool, specifically targeting safe-bin policies and execution path validation.\", caveats=('None.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..845917d1d5e7dcbb2c9c24b1776b76a0abad1749
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_11/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'acpx', 'sandboxing'), description='The issue involves the integration of user-installed plugins (skills) into the specialized Codex/ACPX sandbox environment, involving core system interactions between codex, acpx, and sandboxing policy.', caveats=('The issue specifically addresses a feature request for plugin inheritance within the Codex and ACPX sandboxed environments.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..b97391d76e81d59e26d3b38be51002307c5ddc5b
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_2/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('exec_tools', 'cron_automation', 'ui_tui'), description='The pull request resolves issues related to shell execution path fallback, enhanced diagnostic reporting for browser status, and a deadlock fix for cron force-runs.', caveats=('The PR addresses three distinct technical areas as explicitly listed in the title and summary: exec tool paths, browser status diagnostics (UI/TUI), and cron execution logic.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..fb4c156039830afe531740f6919e9bb188874c13
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_3/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('codex', 'gateway'), description='Fixes gateway startup plugin resolution to ensure Codex-related components are registered correctly, alongside Docker and WhatsApp integration updates.', caveats=(\"The inclusion of 'gateway' is due to the modification of the core gateway startup logic for plugin registration.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..3a4068ff9fb959b5c9db5dda19dd9daa758a4779
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_4/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('model_serving',), description='The issue concerns an OpenAI-compatible streaming protocol bug where final usage chunks from llama.cpp backends are lost due to premature stream termination logic.', caveats=('The instruction specifically states that \"usage\" issues caused by stream lifecycle or OpenAI-compatible streaming behavior are categorized under model_serving, not telemetry_usage.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..d04503ea3abedf1269cf76ae18171e3694b4ed0c
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_5/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'approvals'), description='This issue proposes granular permission scopes (permissionMode) for different agents/bindings within the ACP protocol infrastructure.', caveats=(\"The issue specifically concerns the 'acpx' plugin and 'permissionMode' within ACP sessions, making both acp and approvals central.\",), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..607af4593df08f2f15f791115b936fb62fd83dc4
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_6/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'sessions'), description=\"This pull request fixes a bug where session identity was lost in the gateway's message sending path, and improves how guarded delivery results are reported back to the caller.\", caveats=('The PR specifically targets the gateway-specific path for outbound messages to ensure session identity is preserved through the hook system.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..5c37df7d16e681e3e931dbc7a31bfdc4d4556688
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_7/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('memory',), description='The PR fixes a deadlock issue specifically within the active-memory extension by isolating the recall subagent onto its own lane.', caveats=(), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..ea68462b3469b9457f2f61f9c6574691c2340059
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_8/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('gateway', 'ui_tui'), description='This PR implements a new RPC to fetch full chat messages for the web interface, impacting both the backend gateway protocol and the frontend user interface.', caveats=('The title includes a [codex] tag, but the primary technical changes involve gateway protocols and UI components.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..57b9d6e7929b8ee94a23427c0afd226c131faa31
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/generated_best_outputs_valset/task_9/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('docs', 'queueing'), description=\"This PR clarifies how 'steer' behavior interacts with partial/block streaming in the context of the system's internal queuing mechanism.\", caveats=('The PR focuses on documentation clarifying complex interactions between steering and streaming within the queue system.',), error=None)"
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa-result.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..45ef138a7e4db1f3026dde90111e6ff12b3e5308
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa-result.json
@@ -0,0 +1,123 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 1,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, or policy change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Example: \u201cOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)\u201d is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics only when the title or first clear summary explicitly has three central facets.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Prefer the narrower central topic over a broad fallback.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ },
+ {
+ "routing_policy": "You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.\n\nThis is a fuzzy multi-label routing task. Choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second or third concern.\n\nProcess:\n\n1. Read the title first.\n2. Identify the main user-visible problem, feature, contract, policy, or behavior change.\n3. Pick one primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add secondary topics only when they are explicitly central and removing them would route the item away from a maintainer who must see it.\n6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.\n7. Return only exact allowed topic ids as JSON.\n\nDo not over-label from keywords.\n\nImportant domain rules:\n\n- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.\n- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.\n- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.\n\nExec / sandbox / approval rules:\n\n- Exec command tools, shell execution behavior, exec protocol contracts, exec v2, and exec tool test contracts are `exec_tools`.\n- Sandbox modes, sandbox policy, filesystem/process isolation, sandbox enforcement, or exec behavior under sandbox constraints are `sandboxing`.\n- Approval prompts, permission gates, escalation decisions, permissionMode behavior, or user consent flows are `approvals`.\n- Do not replace `sandboxing` or `approvals` with `security` just because the behavior is security-adjacent.\n- Use `security` only when the item is centrally about security policy, vulnerabilities, secrets, access control, network boundaries, or allowed/blocked behavior as a security concern.\n- A title such as \u201ctest(exec): land exec v2 contract follow-through\u201d is not merely a test-only item. It centrally concerns the exec v2 contract and should include the central contract facets, for example `exec_tools`, `sandboxing`, and `approvals` when those are part of the contract.\n\nACP / ACPX rules:\n\n- ACP protocol/session behavior is `acp`.\n- ACPX session orchestration, per-agent behavior, bindings, agent/session integration, or ACP extension-layer behavior is `acpx`.\n- PermissionMode or permission policy for ACP sessions is also `approvals`.\n- A title such as \u201c[Feature]: Per-binding and per-agent permissionMode for ACP sessions\u201d should include `acp`, `approvals`, and `acpx`: ACP is the protocol/session surface, approvals is the permissionMode concern, and ACPX is central because per-binding/per-agent session behavior routes to ACPX maintainers.\n\nPolicy/config rules:\n\n- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.\n- Do not map the word \u201cmodel\u201d in \u201cmodel policy\u201d, \u201cmodel conformance\u201d, or \u201cmodel checks\u201d to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.\n- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.\n- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.\n- Example: \u201cPolicy: add model, network, and MCP conformance checks\u201d should be `mcp_tooling`, `config`, and `security`, not `model_serving`.\n\nCardinality guidance:\n\n- Use 0 topics when no allowed topic is central.\n- Use 1 topic for a single-focus item.\n- Use 2 topics for normal cross-topic items.\n- Use 3 topics when the title or first clear summary explicitly has three central facets, such as exec + sandboxing + approvals or ACP + approvals + ACPX.\n- Use 4+ topics only for explicit multi-system coordination.\n\nFinal suppression checks before output:\n\n- If a topic was added only because of a word like \u201cusage\u201d, \u201cmodel\u201d, \u201cnetwork\u201d, \u201ctest\u201d, \u201cpolicy\u201d, \u201cstatus\u201d, \u201csecurity\u201d, \u201cpermission\u201d, or \u201cchunk\u201d, verify that the topic is actually the subject, not just context.\n- Do not use broad fallback topics when a narrower central topic exists.\n- Do not use `security` as a generic substitute for `sandboxing` or `approvals`.\n- Never invent topic ids.\n- Output only the final JSON with the selected topic ids."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 20
+ ],
+ "num_full_val_evals": 2,
+ "objective_pareto_front": {
+ "weighted_score": 0.6101190476190476
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 1
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 0
+ ],
+ "1": [
+ 1
+ ],
+ "2": [
+ 0,
+ 1
+ ],
+ "3": [
+ 1
+ ],
+ "4": [
+ 0,
+ 1
+ ],
+ "5": [
+ 1
+ ],
+ "6": [
+ 0
+ ],
+ "7": [
+ 0,
+ 1
+ ],
+ "8": [
+ 0,
+ 1
+ ],
+ "9": [
+ 0,
+ 1
+ ],
+ "10": [
+ 1
+ ],
+ "11": [
+ 1
+ ]
+ },
+ "run_dir": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z",
+ "seed": 0,
+ "total_metric_calls": 32,
+ "val_aggregate_scores": [
+ 0.5375,
+ 0.6101190476190476
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.5375
+ },
+ {
+ "weighted_score": 0.6101190476190476
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 0.5,
+ "1": 0.5,
+ "2": 0.25,
+ "3": 0.25,
+ "4": 1.0,
+ "5": 0.5,
+ "6": 1.0,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 0.2,
+ "11": 0.25
+ },
+ {
+ "0": 0.25,
+ "1": 1.0,
+ "2": 0.25,
+ "3": 0.2857142857142857,
+ "4": 1.0,
+ "5": 1.0,
+ "6": 0.25,
+ "7": 0.5,
+ "8": 0.5,
+ "9": 1.0,
+ "10": 1.0,
+ "11": 0.2857142857142857
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa_state.bin b/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..358b84cfbf88bfdb322d8dea4fa67a4cd21d25de
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:af513c56410f81867bc8b4a31e5eb97593ffcda4a7e77b91a4992fae2e351e09
+size 9600
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..8cc84ad0da7341af0a3cb79c4d3b3144669b582a
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.json
@@ -0,0 +1,39 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 1,
+ 10,
+ 9,
+ 5
+ ],
+ "subsample_scores": [
+ 1.0,
+ 0.2,
+ 1.0,
+ 0.5
+ ],
+ "new_subsample_scores": [
+ 1.0,
+ 1.0,
+ 0.5,
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.txt b/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..54f50f6d4d7aba5ee07c87b8657270d7b1d89de8
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log.txt
@@ -0,0 +1,79 @@
+Iteration 0: Base program full valset score: 0.5375 over 12 / 12 examples
+Iteration 1: Selected program 0 score: 0.5375
+Iteration 1: Proposed new text for routing_policy: You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+This is a fuzzy multi-label routing task. Choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second or third concern.
+
+Process:
+
+1. Read the title first.
+2. Identify the main user-visible problem, feature, contract, policy, or behavior change.
+3. Pick one primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add secondary topics only when they are explicitly central and removing them would route the item away from a maintainer who must see it.
+6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+7. Return only exact allowed topic ids as JSON.
+
+Do not over-label from keywords.
+
+Important domain rules:
+
+- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+Exec / sandbox / approval rules:
+
+- Exec command tools, shell execution behavior, exec protocol contracts, exec v2, and exec tool test contracts are `exec_tools`.
+- Sandbox modes, sandbox policy, filesystem/process isolation, sandbox enforcement, or exec behavior under sandbox constraints are `sandboxing`.
+- Approval prompts, permission gates, escalation decisions, permissionMode behavior, or user consent flows are `approvals`.
+- Do not replace `sandboxing` or `approvals` with `security` just because the behavior is security-adjacent.
+- Use `security` only when the item is centrally about security policy, vulnerabilities, secrets, access control, network boundaries, or allowed/blocked behavior as a security concern.
+- A title such as ātest(exec): land exec v2 contract follow-throughā is not merely a test-only item. It centrally concerns the exec v2 contract and should include the central contract facets, for example `exec_tools`, `sandboxing`, and `approvals` when those are part of the contract.
+
+ACP / ACPX rules:
+
+- ACP protocol/session behavior is `acp`.
+- ACPX session orchestration, per-agent behavior, bindings, agent/session integration, or ACP extension-layer behavior is `acpx`.
+- PermissionMode or permission policy for ACP sessions is also `approvals`.
+- A title such as ā[Feature]: Per-binding and per-agent permissionMode for ACP sessionsā should include `acp`, `approvals`, and `acpx`: ACP is the protocol/session surface, approvals is the permissionMode concern, and ACPX is central because per-binding/per-agent session behavior routes to ACPX maintainers.
+
+Policy/config rules:
+
+- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+Cardinality guidance:
+
+- Use 0 topics when no allowed topic is central.
+- Use 1 topic for a single-focus item.
+- Use 2 topics for normal cross-topic items.
+- Use 3 topics when the title or first clear summary explicitly has three central facets, such as exec + sandboxing + approvals or ACP + approvals + ACPX.
+- Use 4+ topics only for explicit multi-system coordination.
+
+Final suppression checks before output:
+
+- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, āsecurityā, āpermissionā, or āchunkā, verify that the topic is actually the subject, not just context.
+- Do not use broad fallback topics when a narrower central topic exists.
+- Do not use `security` as a generic substitute for `sandboxing` or `approvals`.
+- Never invent topic ids.
+- Output only the final JSON with the selected topic ids.
+Iteration 1: New subsample score 3.5 is better than old score 2.7. Continue to full eval and add to candidate pool.
+Iteration 1: Found a better program on the valset with score 0.6101190476190476.
+Iteration 1: Valset score for new program: 0.6101190476190476 (coverage 12 / 12)
+Iteration 1: Val aggregate for new program: 0.6101190476190476
+Iteration 1: Individual valset scores for new program: {0: 0.25, 1: 1.0, 2: 0.25, 3: 0.2857142857142857, 4: 1.0, 5: 1.0, 6: 0.25, 7: 0.5, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.2857142857142857}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.6101190476190476}
+Iteration 1: New valset pareto front scores: {0: 0.5, 1: 1.0, 2: 0.25, 3: 0.2857142857142857, 4: 1.0, 5: 1.0, 6: 1.0, 7: 0.5, 8: 0.5, 9: 1.0, 10: 1.0, 11: 0.2857142857142857}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.6101190476190476}
+Iteration 1: Valset pareto front aggregate score: 0.6934523809523809
+Iteration 1: Updated valset pareto front programs: {0: {0}, 1: {1}, 2: {0, 1}, 3: {1}, 4: {0, 1}, 5: {1}, 6: {0}, 7: {0, 1}, 8: {0, 1}, 9: {0, 1}, 10: {1}, 11: {1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 1: Best valset aggregate score so far: 0.6101190476190476
+Iteration 1: Best program as per aggregate score on valset: 1
+Iteration 1: Best score on valset: 0.6101190476190476
+Iteration 1: Linear pareto front program index: 1
+Iteration 1: New program candidate index: 1
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log_stderr.txt b/gepa-12b-twelve-from-six-iter-20260612T192815Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-12b-twelve-from-six-iter-20260612T192815Z/summary.json b/gepa-12b-twelve-from-six-iter-20260612T192815Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..41cca224094445020da85476df51f7810c1e3ea8
--- /dev/null
+++ b/gepa-12b-twelve-from-six-iter-20260612T192815Z/summary.json
@@ -0,0 +1,29 @@
+{
+ "best_idx": 1,
+ "best_prompt_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z/best.routing_policy.md",
+ "best_score": 0.6101190476190476,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 2,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-12b-q4km-reason",
+ "state_dir": null,
+ "timeout_ms": 300000
+ },
+ "max_metric_calls": 30,
+ "output_dir": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z",
+ "reflection_minibatch_size": 4,
+ "row_limit": 12,
+ "seed": 0,
+ "seed_routing_policy_chars": 3224,
+ "seed_routing_policy_sha256": "f4b161bb9bbaf366f1d4f1841243d73544bbd3c553ca6be5eb2818e757007187"
+ },
+ "created_at": "2026-06-12T19:43:20.318280+00:00",
+ "num_candidates": 2,
+ "num_full_val_evals": 2,
+ "result_path": "prompt-optimizer/out/gepa-12b-twelve-from-six-iter-20260612T192815Z/gepa-result.json",
+ "total_metric_calls": 32
+}
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184732Z/run_log.txt b/gepa-e4b-smoke-20260612T184732Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-e4b-smoke-20260612T184732Z/run_log_stderr.txt b/gepa-e4b-smoke-20260612T184732Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-e4b-smoke-20260612T184748Z/best.prompt.md b/gepa-e4b-smoke-20260612T184748Z/best.prompt.md
new file mode 100644
index 0000000000000000000000000000000000000000..3f119fc592c24043f2f2f0375ac244e6cf70bcb4
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/best.prompt.md
@@ -0,0 +1,124 @@
+# OpenClaw Routing Classifier
+
+Classify one OpenClaw GitHub issue or pull request for maintainer notification
+routing, not code search. Return only the final structured JSON required by the
+schema. No prose, markdown, analysis, or extra fields.
+
+Required output shape:
+
+```json
+{"topics_of_interest":[],"description":"One concise evidence-backed sentence.","caveats":[]}
+```
+
+## Inner Monologue
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+## Repository Reads
+
+A read-only `bash` tool may be available in the OpenClaw repo snapshot. Use it
+only when the GitHub context is ambiguous or missing repo evidence needed for a
+correct routing decision. Prefer short commands such as `pwd`, `ls`, `find`,
+`rg`, `grep`, `sed -n`, `cat`, `head`, `tail`, `wc -l`,
+`git show --name-only`, `git ls-files`, or `git grep`.
+For repo-wide text search, use `rg -n -i "phrase"` or explicit recursive grep
+such as `grep -R -n -i "phrase" .`. For file discovery, use
+`rg --files -g "*.ts"` or `git ls-files src`.
+Do not call `bash` when the provided GitHub context is enough.
+
+## Allowed Topics
+
+```json
+__ALLOWED_TOPICS_JSON__
+```
+
+Topic definitions and cue words:
+
+__TOPIC_DESCRIPTIONS__
+
+Classify a GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Input format may include:
+- `target`: GitHub issue or PR URL
+- `title`: strongest evidence
+- optional body, summary, labels, files, comments, or diff context
+- training examples may also include `gold_topics`, prior generated outputs, and feedback; use those only to learn corrections, not as fields for live classification
+
+Output:
+- Call `final_json` with `topics_of_interest`
+- Use only exact allowed topic ids
+- Do not invent aliases such as `cli`, `provider`, `tool`, `thread`, `status`, `openrouter`, or `tts`
+
+Decision process:
+1. Read the title first.
+2. Identify the main user-visible feature, behavior, or failure.
+3. Pick the single best primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central enough that omitting it would route the item to the wrong maintainer bucket.
+6. Remove labels based only on implementation details, tests, examples, file paths, side effects, broad impact, or speculative consequences.
+7. Stop once the smallest complete set is clear.
+
+Cardinality:
+- 0 topics if no allowed topic is central.
+- 1 topic for a single-focus item.
+- 2 topics for ordinary cross-topic items.
+- 3 topics only when the title or first clear problem statement explicitly names three central facets.
+- 4+ topics only for explicit multi-system coordination.
+
+High-priority routing cues:
+- Named protocol or adapter: `acp`, `acpx`, `mcp`, gateway protocol, OpenAI-compatible streaming, Responses API, chat completions.
+- Gateway daemon, gateway API, startup, restart, gateway-owned routing, or gateway-owned runtime: `gateway`.
+- Agent runner, backend, embedded runtime, lifecycle, heartbeat, attempt orchestration, node-backed runtime, or runtime state: `agent_runtime`.
+- Subagents, coding-agent runs, harness behavior, approvals, sandboxing, compaction, or agent orchestration: `coding_agents`.
+- Session identity, binding, persistence, transcript, resume, reset, parent/child state: `sessions`.
+- Shell commands, execution control, durable exec, allow/deny controls, spoken/tool output execution: `exec_tools`.
+- Tool invocation semantics, structured tool results, stdout rendering for tool results, pre-tool text preservation, tool-call transcript/content handling: `tool_calling`.
+- Counts, usage, costs, tokens, metrics, diagnostics, traces, status reporting, footer counts: `telemetry_usage`.
+- Base URL normalization, endpoint lifecycle/selection, streaming, request routing, OpenAI-compatible serving, vLLM/TGI/LocalAI endpoint behavior: `model_serving`.
+- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local context, offline/local model compatibility or crashes: `local_models`.
+- Local/self-hosted provider setup, routing, auth, discovery, compatibility: `local_model_providers`.
+- Self-hosted embeddings, memory providers, speech services, private inference services: `self_hosted_inference`.
+- Named Discord, Telegram, Slack, Zulip, Feishu, webchat surfaces: `chat_integrations`.
+- Generic notify policy or delivery gates: `notifications`.
+- Chat UI display/status/footer behavior when the user-facing interface itself is central: `ui_tui`.
+- External API, CLI, or HTTP contracts: `api_surface`.
+- Operational timeout, crash, leak, retry, stuck state, data loss, cleanup, or recovery correctness: `reliability`.
+- Configuration behavior itself: `config`.
+- Docs or testing infrastructure as the subject: `docs` or `tests_ci`.
+
+Important learned correction:
+- For a title like `ACP: add gateway-owned node-backed runtime`, the correct topics are `acp`, `gateway`, and `agent_runtime`.
+- `ACP` is a central protocol topic.
+- `gateway-owned` is central `gateway`, not incidental.
+- `node-backed runtime` is central `agent_runtime`.
+- Do not add `sessions` unless session identity/lifecycle/storage is explicitly central.
+- Do not add `reliability` merely because runtime work could affect robustness; reliability requires an explicit failure, recovery, timeout, crash, leak, data loss, retry, or stuck-state concern.
+
+Hard filters:
+- Delete `local_model_providers` unless the item explicitly centers a local, self-hosted, user-declared OpenAI-compatible backend, localhost/LAN endpoint, or private inference setup.
+- Do not use `local_model_providers` for hosted provider catalogs, hosted model availability, hosted auth, usage/billing UI, TTS/speech/image behavior, or generic hosted APIs.
+- Use `local_models` only for concrete local/offline model execution or compatibility, not static model ids, catalogs, provider manifests, or hosted provider availability.
+- Do not use `api_surface` for internal helpers, status text, UI events, ordinary command behavior, token parsing tests, or local compatibility.
+- Do not use `config` merely because an option exists; classify by what the option controls.
+- Do not use `tests_ci` unless the testing surface itself is the subject.
+- Do not use `docs` unless documentation itself is the subject.
+- Do not use `gateway` merely because traffic passes through the gateway; use it only when gateway ownership, daemon/API behavior, startup/restart, or routing is central.
+
+Optimize for the smallest complete set, not for a single label and not for broad recall.## Target
+
+`__TARGET__`
+
+## GitHub Context
+
+__GITHUB_CONTEXT__
+
+Use this context as source of truth. If important sections are missing,
+unavailable, selected, or truncated, classify from what is available and mention
+material limits in `caveats`.
+
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
+
+You MUST keep your inner monologue, your thought process, your Chain of Thought restricted to 2 short paragraphs maximum. Do not deliberate topic by topic; weigh only the strongest candidates, then call final_json. It is ABSOLUTELY IMPERATIVE that you DO NOT EXCEED 50 WORDS and reply as soon as possible.
diff --git a/gepa-e4b-smoke-20260612T184748Z/best.routing_policy.md b/gepa-e4b-smoke-20260612T184748Z/best.routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..cf555d63522349a20d3c279a00e93cd68a2b0b86
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/best.routing_policy.md
@@ -0,0 +1,69 @@
+Classify a GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Input format may include:
+- `target`: GitHub issue or PR URL
+- `title`: strongest evidence
+- optional body, summary, labels, files, comments, or diff context
+- training examples may also include `gold_topics`, prior generated outputs, and feedback; use those only to learn corrections, not as fields for live classification
+
+Output:
+- Call `final_json` with `topics_of_interest`
+- Use only exact allowed topic ids
+- Do not invent aliases such as `cli`, `provider`, `tool`, `thread`, `status`, `openrouter`, or `tts`
+
+Decision process:
+1. Read the title first.
+2. Identify the main user-visible feature, behavior, or failure.
+3. Pick the single best primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central enough that omitting it would route the item to the wrong maintainer bucket.
+6. Remove labels based only on implementation details, tests, examples, file paths, side effects, broad impact, or speculative consequences.
+7. Stop once the smallest complete set is clear.
+
+Cardinality:
+- 0 topics if no allowed topic is central.
+- 1 topic for a single-focus item.
+- 2 topics for ordinary cross-topic items.
+- 3 topics only when the title or first clear problem statement explicitly names three central facets.
+- 4+ topics only for explicit multi-system coordination.
+
+High-priority routing cues:
+- Named protocol or adapter: `acp`, `acpx`, `mcp`, gateway protocol, OpenAI-compatible streaming, Responses API, chat completions.
+- Gateway daemon, gateway API, startup, restart, gateway-owned routing, or gateway-owned runtime: `gateway`.
+- Agent runner, backend, embedded runtime, lifecycle, heartbeat, attempt orchestration, node-backed runtime, or runtime state: `agent_runtime`.
+- Subagents, coding-agent runs, harness behavior, approvals, sandboxing, compaction, or agent orchestration: `coding_agents`.
+- Session identity, binding, persistence, transcript, resume, reset, parent/child state: `sessions`.
+- Shell commands, execution control, durable exec, allow/deny controls, spoken/tool output execution: `exec_tools`.
+- Tool invocation semantics, structured tool results, stdout rendering for tool results, pre-tool text preservation, tool-call transcript/content handling: `tool_calling`.
+- Counts, usage, costs, tokens, metrics, diagnostics, traces, status reporting, footer counts: `telemetry_usage`.
+- Base URL normalization, endpoint lifecycle/selection, streaming, request routing, OpenAI-compatible serving, vLLM/TGI/LocalAI endpoint behavior: `model_serving`.
+- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local context, offline/local model compatibility or crashes: `local_models`.
+- Local/self-hosted provider setup, routing, auth, discovery, compatibility: `local_model_providers`.
+- Self-hosted embeddings, memory providers, speech services, private inference services: `self_hosted_inference`.
+- Named Discord, Telegram, Slack, Zulip, Feishu, webchat surfaces: `chat_integrations`.
+- Generic notify policy or delivery gates: `notifications`.
+- Chat UI display/status/footer behavior when the user-facing interface itself is central: `ui_tui`.
+- External API, CLI, or HTTP contracts: `api_surface`.
+- Operational timeout, crash, leak, retry, stuck state, data loss, cleanup, or recovery correctness: `reliability`.
+- Configuration behavior itself: `config`.
+- Docs or testing infrastructure as the subject: `docs` or `tests_ci`.
+
+Important learned correction:
+- For a title like `ACP: add gateway-owned node-backed runtime`, the correct topics are `acp`, `gateway`, and `agent_runtime`.
+- `ACP` is a central protocol topic.
+- `gateway-owned` is central `gateway`, not incidental.
+- `node-backed runtime` is central `agent_runtime`.
+- Do not add `sessions` unless session identity/lifecycle/storage is explicitly central.
+- Do not add `reliability` merely because runtime work could affect robustness; reliability requires an explicit failure, recovery, timeout, crash, leak, data loss, retry, or stuck-state concern.
+
+Hard filters:
+- Delete `local_model_providers` unless the item explicitly centers a local, self-hosted, user-declared OpenAI-compatible backend, localhost/LAN endpoint, or private inference setup.
+- Do not use `local_model_providers` for hosted provider catalogs, hosted model availability, hosted auth, usage/billing UI, TTS/speech/image behavior, or generic hosted APIs.
+- Use `local_models` only for concrete local/offline model execution or compatibility, not static model ids, catalogs, provider manifests, or hosted provider availability.
+- Do not use `api_surface` for internal helpers, status text, UI events, ordinary command behavior, token parsing tests, or local compatibility.
+- Do not use `config` merely because an option exists; classify by what the option controls.
+- Do not use `tests_ci` unless the testing surface itself is the subject.
+- Do not use `docs` unless documentation itself is the subject.
+- Do not use `gateway` merely because traffic passes through the gateway; use it only when gateway ownership, daemon/API behavior, startup/restart, or routing is central.
+
+Optimize for the smallest complete set, not for a single label and not for broad recall.
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/candidate_tree.html b/gepa-e4b-smoke-20260612T184748Z/candidate_tree.html
new file mode 100644
index 0000000000000000000000000000000000000000..812f8f842134d512d7d99303746a0f772610f8be
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/candidate_tree.html
@@ -0,0 +1,173 @@
+
+
+
+
+
+GEPA Candidate Tree
+
+
+
+
+
GEPA Candidate Tree
+
+
Best
+
Pareto Front
+
Other
+
+
+
Loading graph…
+
+
+
+
+
diff --git a/gepa-e4b-smoke-20260612T184748Z/candidates.json b/gepa-e4b-smoke-20260612T184748Z/candidates.json
new file mode 100644
index 0000000000000000000000000000000000000000..d3457c21fd7eba70d8f31ee7ec87374db4b489a4
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/candidates.json
@@ -0,0 +1,8 @@
+[
+ {
+ "routing_policy": "## Goal\n\nChoose the smallest topic set that would route this item to the right\nmaintainer interest bucket without hiding a central second concern.\n\nThis is a fuzzy multi-label task. Avoid both failure modes:\n\n- Over-labeling: adding topics for implementation details, tests, examples, or\n side effects.\n- Under-labeling: dropping a topic that is explicitly central to the title or\n first clear problem statement.\n\n## Decision Process\n\nDo these checks internally, then call `final_json`.\n\n1. Read the title first. Identify the main user-visible problem or feature.\n2. Pick the single best primary topic.\n3. Scan the title and first clear body summary for explicit central facets:\n protocol/runtime, provider/model serving, session/state, execution/tooling,\n channel/UI, memory, security/auth, config, reliability, docs/tests.\n4. Add a secondary topic only when removing it would misroute a maintainer who\n follows that bucket.\n5. Delete topics that are only file locations, validation work, examples,\n dependencies, likely consequences, or broad words from comments.\n6. Do a final cardinality check:\n - 0 topics when no allowed topic is central.\n - 1 topic for a single-focus item.\n - 2 topics for normal cross-topic items.\n - 3 topics when the title/body explicitly has three central facets.\n - 4+ topics only for explicit multi-system coordination, not broad impact.\n\nDo not optimize for one label. Optimize for the smallest complete set.\n\n## Runtime Discipline\n\nThis is a routing task, not a full PR review.\n\n- Do not audit every allowed topic.\n- Do not summarize the whole issue, comments, or diff.\n- Do not inspect repository files when the title and GitHub context are enough.\n- Stop once the primary topic and any explicit central secondary facets are\n identified.\n- Call `final_json` immediately after the final cardinality check.\n\n## Evidence Priority\n\n- Title is strongest.\n- The first clear problem statement or summary in the body is next.\n- Labels, changed files, comments, and diff can confirm or disambiguate, but\n they must not broaden the topic set by themselves.\n- Tests and docs files are weak evidence unless tests or docs are the subject.\n- If context is truncated, classify from the visible central evidence and add a\n caveat only when that materially limits confidence.\n\n## Central Facet Recovery\n\nBefore final output, check whether any of these central facets were dropped:\n\n- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway\n protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration\n behavior can be central as `agent_runtime` or `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, or\n parent/child state can be central as `sessions`.\n- Execution control, shell commands, tool allow/deny, durable exec, or tool\n invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or\n `tool_calling` depending on the exact behavior.\n- Local/self-hosted model backends, local hardware, OpenAI-compatible local\n serving, embeddings providers, and model endpoint behavior can be central as\n `local_models`, `local_model_providers`, `model_serving`, or\n `self_hosted_inference`.\n- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,\n notifications, and telemetry can be central when the title or summary makes\n them the subject.\n\nIf a facet is central but maps to a narrower allowed topic, use the narrower\ntopic instead of a broad fallback.\n\n## Enum Discipline\n\n- Output only exact allowed topic ids.\n- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,\n `thread`, `provider`, `tool`, or `test`.\n- If the closest word in the title is not allowed, map it to the nearest\n allowed topic or omit it.\n\n## Positive Cues\n\n- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status\n reporting route to `telemetry_usage`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,\n compaction, or agent orchestration route to `coding_agents`.\n- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,\n or runtime state route to `agent_runtime` when the agent runtime itself is\n central.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,\n local fallback, and local context behavior route to `local_models`.\n- OpenAI-compatible serving, base URL normalization for model endpoints,\n streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and\n request routing route to `model_serving`.\n- Self-hosted embeddings, memory providers, speech services, private inference,\n and locally operated inference services route to `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces\n route to `chat_integrations`; generic notify policy/delivery gates route to\n `notifications`.\n- Chat UI display/status/footer behavior routes to `ui_tui` only when the\n user-facing interface is central.\n- Test-only contract follow-through routes to `tests_ci` only when the testing\n surface itself is the subject; otherwise route by the runtime contract being\n tested.\n\n## Hard Local Model Filters\n\nBefore final output:\n\n- Delete `local_model_providers` unless the item explicitly centers a local,\n self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,\n vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private\n inference.\n- Never use `local_model_providers` for hosted provider catalogs/manifests,\n hosted model availability, hosted auth, usage/billing UI, TTS/speech/image\n behavior, or hosted APIs. Hosted catalog updates are `model_releases` and\n sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or\n `ui_tui`.\n- Use `local_models` only for concrete local/offline model execution or\n compatibility; not model IDs, catalogs, lists, static entries, provider\n manifests, or hosted provider availability. Keep it for local-model lean\n filtering and local-model runtime crashes.\n\n## Over-Label Guardrails\n\n- `api_surface`: external API, CLI, or HTTP contracts only. Not internal\n payloads/options/functions, status text, UI events, or ordinary command\n behavior.\n- `reliability`: operational failures such as timeout, crash, leak, retry,\n stuck state, data loss, cleanup, or recovery. Not a generic bug tag.\n- `sessions`: session lifecycle/state/storage/identity only. Not every item\n mentioning session context or files.\n- `local_model_providers`: provider setup/routing/auth/discovery/compatibility\n only. Not every local endpoint issue.\n- `config`: configuration behavior itself, not any feature with an option.\n- `docs` and `tests_ci`: only when docs or test tooling is the subject.\n- `security`: security boundary, credentials, auth hardening, SSRF, token\n handling, or vulnerability only. Not every risky-looking behavior.\n- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned\n routing only. Not every item that passes through the gateway.\n\n## Tie-Breakers\n\n- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features\n are `telemetry_usage`, even if shown in UI or session status.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request\n routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model\n endpoint compatibility are `model_serving`.\n- TTS, shell/exec, command, tool invocation, allowlist, and execution-control\n behavior are `exec_tools` when the feature controls execution or spoken/tool\n output.\n- Thread/session isolation, per-session binding, fallback recovery state, and\n lifecycle state are `sessions` when those boundaries are central.\n- Structured tool result display, stdout rendering for tool results, pre-tool\n text preservation, and tool-call transcript/content handling are\n `tool_calling` when tool-call semantics are central.\n- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate\n cleanup, and lifecycle recovery are `reliability` when recovery correctness is\n central.\n\n## False-Positive Suppression\n\n- Do not use `local_model_providers` for base URL normalization,\n OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,\n usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider\n setup/auth/discovery/routing is central.\n- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu\n behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or\n outbound recovery correctness. Use `chat_integrations` for named chat surfaces\n and `reliability` for recovery/loss/fallback.\n- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,\n generic tool output, or config-like options.\n- Do not use `api_surface` for parse helpers, CLI edge-case tests, token\n parsing, status/footer display, internal command behavior, or local model\n compatibility.\n- Do not use `config` merely because a feature adds an option. Route by what\n the option controls.\n\n"
+ },
+ {
+ "routing_policy": "Classify a GitHub issue or pull request into the smallest complete set of allowed topic ids.\n\nInput format may include:\n- `target`: GitHub issue or PR URL\n- `title`: strongest evidence\n- optional body, summary, labels, files, comments, or diff context\n- training examples may also include `gold_topics`, prior generated outputs, and feedback; use those only to learn corrections, not as fields for live classification\n\nOutput:\n- Call `final_json` with `topics_of_interest`\n- Use only exact allowed topic ids\n- Do not invent aliases such as `cli`, `provider`, `tool`, `thread`, `status`, `openrouter`, or `tts`\n\nDecision process:\n1. Read the title first.\n2. Identify the main user-visible feature, behavior, or failure.\n3. Pick the single best primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central enough that omitting it would route the item to the wrong maintainer bucket.\n6. Remove labels based only on implementation details, tests, examples, file paths, side effects, broad impact, or speculative consequences.\n7. Stop once the smallest complete set is clear.\n\nCardinality:\n- 0 topics if no allowed topic is central.\n- 1 topic for a single-focus item.\n- 2 topics for ordinary cross-topic items.\n- 3 topics only when the title or first clear problem statement explicitly names three central facets.\n- 4+ topics only for explicit multi-system coordination.\n\nHigh-priority routing cues:\n- Named protocol or adapter: `acp`, `acpx`, `mcp`, gateway protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Gateway daemon, gateway API, startup, restart, gateway-owned routing, or gateway-owned runtime: `gateway`.\n- Agent runner, backend, embedded runtime, lifecycle, heartbeat, attempt orchestration, node-backed runtime, or runtime state: `agent_runtime`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing, compaction, or agent orchestration: `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, parent/child state: `sessions`.\n- Shell commands, execution control, durable exec, allow/deny controls, spoken/tool output execution: `exec_tools`.\n- Tool invocation semantics, structured tool results, stdout rendering for tool results, pre-tool text preservation, tool-call transcript/content handling: `tool_calling`.\n- Counts, usage, costs, tokens, metrics, diagnostics, traces, status reporting, footer counts: `telemetry_usage`.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request routing, OpenAI-compatible serving, vLLM/TGI/LocalAI endpoint behavior: `model_serving`.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local context, offline/local model compatibility or crashes: `local_models`.\n- Local/self-hosted provider setup, routing, auth, discovery, compatibility: `local_model_providers`.\n- Self-hosted embeddings, memory providers, speech services, private inference services: `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat surfaces: `chat_integrations`.\n- Generic notify policy or delivery gates: `notifications`.\n- Chat UI display/status/footer behavior when the user-facing interface itself is central: `ui_tui`.\n- External API, CLI, or HTTP contracts: `api_surface`.\n- Operational timeout, crash, leak, retry, stuck state, data loss, cleanup, or recovery correctness: `reliability`.\n- Configuration behavior itself: `config`.\n- Docs or testing infrastructure as the subject: `docs` or `tests_ci`.\n\nImportant learned correction:\n- For a title like `ACP: add gateway-owned node-backed runtime`, the correct topics are `acp`, `gateway`, and `agent_runtime`.\n- `ACP` is a central protocol topic.\n- `gateway-owned` is central `gateway`, not incidental.\n- `node-backed runtime` is central `agent_runtime`.\n- Do not add `sessions` unless session identity/lifecycle/storage is explicitly central.\n- Do not add `reliability` merely because runtime work could affect robustness; reliability requires an explicit failure, recovery, timeout, crash, leak, data loss, retry, or stuck-state concern.\n\nHard filters:\n- Delete `local_model_providers` unless the item explicitly centers a local, self-hosted, user-declared OpenAI-compatible backend, localhost/LAN endpoint, or private inference setup.\n- Do not use `local_model_providers` for hosted provider catalogs, hosted model availability, hosted auth, usage/billing UI, TTS/speech/image behavior, or generic hosted APIs.\n- Use `local_models` only for concrete local/offline model execution or compatibility, not static model ids, catalogs, provider manifests, or hosted provider availability.\n- Do not use `api_surface` for internal helpers, status text, UI events, ordinary command behavior, token parsing tests, or local compatibility.\n- Do not use `config` merely because an option exists; classify by what the option controls.\n- Do not use `tests_ci` unless the testing surface itself is the subject.\n- Do not use `docs` unless documentation itself is the subject.\n- Do not use `gateway` merely because traffic passes through the gateway; use it only when gateway ownership, daemon/API behavior, startup/restart, or routing is central.\n\nOptimize for the smallest complete set, not for a single label and not for broad recall."
+ }
+]
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json b/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..7c401f46b23d55765e3d4d72988d624f3af17f27
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_0/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('acp', 'gateway', 'reliability'), description='This PR introduces a durable node-backed ACP runtime where the gateway manages state for recovery and replay after worker failures.', caveats=('The issue heavily focuses on documentation/protocol design aspects related to ACP runtime, but the core change is architectural routing.',), error=None)"
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json b/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
new file mode 100644
index 0000000000000000000000000000000000000000..beffa765cd2fba07399d66386522f7c7cfa2e381
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/generated_best_outputs_valset/task_1/iter_0_prog_0.json
@@ -0,0 +1 @@
+"ClassifierOutput(topics_of_interest=('config', 'local_model_providers', 'mcp_tooling'), description='This PR extends the Policy plugin to include configuration conformance checks for model providers, network SSRF posture, and managed MCP servers.', caveats=('The PR implements a read-only conformance layer over configurations, not runtime enforcement.',), error=None)"
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/gepa-result.json b/gepa-e4b-smoke-20260612T184748Z/gepa-result.json
new file mode 100644
index 0000000000000000000000000000000000000000..dabea35218a6ed6536327dfbdc442ca2bd100eec
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/gepa-result.json
@@ -0,0 +1,69 @@
+{
+ "_str_candidate_key": null,
+ "best_idx": 1,
+ "best_outputs_valset": null,
+ "candidates": [
+ {
+ "routing_policy": "## Goal\n\nChoose the smallest topic set that would route this item to the right\nmaintainer interest bucket without hiding a central second concern.\n\nThis is a fuzzy multi-label task. Avoid both failure modes:\n\n- Over-labeling: adding topics for implementation details, tests, examples, or\n side effects.\n- Under-labeling: dropping a topic that is explicitly central to the title or\n first clear problem statement.\n\n## Decision Process\n\nDo these checks internally, then call `final_json`.\n\n1. Read the title first. Identify the main user-visible problem or feature.\n2. Pick the single best primary topic.\n3. Scan the title and first clear body summary for explicit central facets:\n protocol/runtime, provider/model serving, session/state, execution/tooling,\n channel/UI, memory, security/auth, config, reliability, docs/tests.\n4. Add a secondary topic only when removing it would misroute a maintainer who\n follows that bucket.\n5. Delete topics that are only file locations, validation work, examples,\n dependencies, likely consequences, or broad words from comments.\n6. Do a final cardinality check:\n - 0 topics when no allowed topic is central.\n - 1 topic for a single-focus item.\n - 2 topics for normal cross-topic items.\n - 3 topics when the title/body explicitly has three central facets.\n - 4+ topics only for explicit multi-system coordination, not broad impact.\n\nDo not optimize for one label. Optimize for the smallest complete set.\n\n## Runtime Discipline\n\nThis is a routing task, not a full PR review.\n\n- Do not audit every allowed topic.\n- Do not summarize the whole issue, comments, or diff.\n- Do not inspect repository files when the title and GitHub context are enough.\n- Stop once the primary topic and any explicit central secondary facets are\n identified.\n- Call `final_json` immediately after the final cardinality check.\n\n## Evidence Priority\n\n- Title is strongest.\n- The first clear problem statement or summary in the body is next.\n- Labels, changed files, comments, and diff can confirm or disambiguate, but\n they must not broaden the topic set by themselves.\n- Tests and docs files are weak evidence unless tests or docs are the subject.\n- If context is truncated, classify from the visible central evidence and add a\n caveat only when that materially limits confidence.\n\n## Central Facet Recovery\n\nBefore final output, check whether any of these central facets were dropped:\n\n- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway\n protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration\n behavior can be central as `agent_runtime` or `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, or\n parent/child state can be central as `sessions`.\n- Execution control, shell commands, tool allow/deny, durable exec, or tool\n invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or\n `tool_calling` depending on the exact behavior.\n- Local/self-hosted model backends, local hardware, OpenAI-compatible local\n serving, embeddings providers, and model endpoint behavior can be central as\n `local_models`, `local_model_providers`, `model_serving`, or\n `self_hosted_inference`.\n- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,\n notifications, and telemetry can be central when the title or summary makes\n them the subject.\n\nIf a facet is central but maps to a narrower allowed topic, use the narrower\ntopic instead of a broad fallback.\n\n## Enum Discipline\n\n- Output only exact allowed topic ids.\n- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,\n `thread`, `provider`, `tool`, or `test`.\n- If the closest word in the title is not allowed, map it to the nearest\n allowed topic or omit it.\n\n## Positive Cues\n\n- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status\n reporting route to `telemetry_usage`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,\n compaction, or agent orchestration route to `coding_agents`.\n- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,\n or runtime state route to `agent_runtime` when the agent runtime itself is\n central.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,\n local fallback, and local context behavior route to `local_models`.\n- OpenAI-compatible serving, base URL normalization for model endpoints,\n streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and\n request routing route to `model_serving`.\n- Self-hosted embeddings, memory providers, speech services, private inference,\n and locally operated inference services route to `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces\n route to `chat_integrations`; generic notify policy/delivery gates route to\n `notifications`.\n- Chat UI display/status/footer behavior routes to `ui_tui` only when the\n user-facing interface is central.\n- Test-only contract follow-through routes to `tests_ci` only when the testing\n surface itself is the subject; otherwise route by the runtime contract being\n tested.\n\n## Hard Local Model Filters\n\nBefore final output:\n\n- Delete `local_model_providers` unless the item explicitly centers a local,\n self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,\n vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private\n inference.\n- Never use `local_model_providers` for hosted provider catalogs/manifests,\n hosted model availability, hosted auth, usage/billing UI, TTS/speech/image\n behavior, or hosted APIs. Hosted catalog updates are `model_releases` and\n sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or\n `ui_tui`.\n- Use `local_models` only for concrete local/offline model execution or\n compatibility; not model IDs, catalogs, lists, static entries, provider\n manifests, or hosted provider availability. Keep it for local-model lean\n filtering and local-model runtime crashes.\n\n## Over-Label Guardrails\n\n- `api_surface`: external API, CLI, or HTTP contracts only. Not internal\n payloads/options/functions, status text, UI events, or ordinary command\n behavior.\n- `reliability`: operational failures such as timeout, crash, leak, retry,\n stuck state, data loss, cleanup, or recovery. Not a generic bug tag.\n- `sessions`: session lifecycle/state/storage/identity only. Not every item\n mentioning session context or files.\n- `local_model_providers`: provider setup/routing/auth/discovery/compatibility\n only. Not every local endpoint issue.\n- `config`: configuration behavior itself, not any feature with an option.\n- `docs` and `tests_ci`: only when docs or test tooling is the subject.\n- `security`: security boundary, credentials, auth hardening, SSRF, token\n handling, or vulnerability only. Not every risky-looking behavior.\n- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned\n routing only. Not every item that passes through the gateway.\n\n## Tie-Breakers\n\n- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features\n are `telemetry_usage`, even if shown in UI or session status.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request\n routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model\n endpoint compatibility are `model_serving`.\n- TTS, shell/exec, command, tool invocation, allowlist, and execution-control\n behavior are `exec_tools` when the feature controls execution or spoken/tool\n output.\n- Thread/session isolation, per-session binding, fallback recovery state, and\n lifecycle state are `sessions` when those boundaries are central.\n- Structured tool result display, stdout rendering for tool results, pre-tool\n text preservation, and tool-call transcript/content handling are\n `tool_calling` when tool-call semantics are central.\n- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate\n cleanup, and lifecycle recovery are `reliability` when recovery correctness is\n central.\n\n## False-Positive Suppression\n\n- Do not use `local_model_providers` for base URL normalization,\n OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,\n usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider\n setup/auth/discovery/routing is central.\n- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu\n behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or\n outbound recovery correctness. Use `chat_integrations` for named chat surfaces\n and `reliability` for recovery/loss/fallback.\n- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,\n generic tool output, or config-like options.\n- Do not use `api_surface` for parse helpers, CLI edge-case tests, token\n parsing, status/footer display, internal command behavior, or local model\n compatibility.\n- Do not use `config` merely because a feature adds an option. Route by what\n the option controls.\n\n"
+ },
+ {
+ "routing_policy": "Classify a GitHub issue or pull request into the smallest complete set of allowed topic ids.\n\nInput format may include:\n- `target`: GitHub issue or PR URL\n- `title`: strongest evidence\n- optional body, summary, labels, files, comments, or diff context\n- training examples may also include `gold_topics`, prior generated outputs, and feedback; use those only to learn corrections, not as fields for live classification\n\nOutput:\n- Call `final_json` with `topics_of_interest`\n- Use only exact allowed topic ids\n- Do not invent aliases such as `cli`, `provider`, `tool`, `thread`, `status`, `openrouter`, or `tts`\n\nDecision process:\n1. Read the title first.\n2. Identify the main user-visible feature, behavior, or failure.\n3. Pick the single best primary topic.\n4. Read only the first clear body summary if needed to disambiguate.\n5. Add a secondary topic only when it is explicitly central enough that omitting it would route the item to the wrong maintainer bucket.\n6. Remove labels based only on implementation details, tests, examples, file paths, side effects, broad impact, or speculative consequences.\n7. Stop once the smallest complete set is clear.\n\nCardinality:\n- 0 topics if no allowed topic is central.\n- 1 topic for a single-focus item.\n- 2 topics for ordinary cross-topic items.\n- 3 topics only when the title or first clear problem statement explicitly names three central facets.\n- 4+ topics only for explicit multi-system coordination.\n\nHigh-priority routing cues:\n- Named protocol or adapter: `acp`, `acpx`, `mcp`, gateway protocol, OpenAI-compatible streaming, Responses API, chat completions.\n- Gateway daemon, gateway API, startup, restart, gateway-owned routing, or gateway-owned runtime: `gateway`.\n- Agent runner, backend, embedded runtime, lifecycle, heartbeat, attempt orchestration, node-backed runtime, or runtime state: `agent_runtime`.\n- Subagents, coding-agent runs, harness behavior, approvals, sandboxing, compaction, or agent orchestration: `coding_agents`.\n- Session identity, binding, persistence, transcript, resume, reset, parent/child state: `sessions`.\n- Shell commands, execution control, durable exec, allow/deny controls, spoken/tool output execution: `exec_tools`.\n- Tool invocation semantics, structured tool results, stdout rendering for tool results, pre-tool text preservation, tool-call transcript/content handling: `tool_calling`.\n- Counts, usage, costs, tokens, metrics, diagnostics, traces, status reporting, footer counts: `telemetry_usage`.\n- Base URL normalization, endpoint lifecycle/selection, streaming, request routing, OpenAI-compatible serving, vLLM/TGI/LocalAI endpoint behavior: `model_serving`.\n- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local context, offline/local model compatibility or crashes: `local_models`.\n- Local/self-hosted provider setup, routing, auth, discovery, compatibility: `local_model_providers`.\n- Self-hosted embeddings, memory providers, speech services, private inference services: `self_hosted_inference`.\n- Named Discord, Telegram, Slack, Zulip, Feishu, webchat surfaces: `chat_integrations`.\n- Generic notify policy or delivery gates: `notifications`.\n- Chat UI display/status/footer behavior when the user-facing interface itself is central: `ui_tui`.\n- External API, CLI, or HTTP contracts: `api_surface`.\n- Operational timeout, crash, leak, retry, stuck state, data loss, cleanup, or recovery correctness: `reliability`.\n- Configuration behavior itself: `config`.\n- Docs or testing infrastructure as the subject: `docs` or `tests_ci`.\n\nImportant learned correction:\n- For a title like `ACP: add gateway-owned node-backed runtime`, the correct topics are `acp`, `gateway`, and `agent_runtime`.\n- `ACP` is a central protocol topic.\n- `gateway-owned` is central `gateway`, not incidental.\n- `node-backed runtime` is central `agent_runtime`.\n- Do not add `sessions` unless session identity/lifecycle/storage is explicitly central.\n- Do not add `reliability` merely because runtime work could affect robustness; reliability requires an explicit failure, recovery, timeout, crash, leak, data loss, retry, or stuck-state concern.\n\nHard filters:\n- Delete `local_model_providers` unless the item explicitly centers a local, self-hosted, user-declared OpenAI-compatible backend, localhost/LAN endpoint, or private inference setup.\n- Do not use `local_model_providers` for hosted provider catalogs, hosted model availability, hosted auth, usage/billing UI, TTS/speech/image behavior, or generic hosted APIs.\n- Use `local_models` only for concrete local/offline model execution or compatibility, not static model ids, catalogs, provider manifests, or hosted provider availability.\n- Do not use `api_surface` for internal helpers, status text, UI events, ordinary command behavior, token parsing tests, or local compatibility.\n- Do not use `config` merely because an option exists; classify by what the option controls.\n- Do not use `tests_ci` unless the testing surface itself is the subject.\n- Do not use `docs` unless documentation itself is the subject.\n- Do not use `gateway` merely because traffic passes through the gateway; use it only when gateway ownership, daemon/API behavior, startup/restart, or routing is central.\n\nOptimize for the smallest complete set, not for a single label and not for broad recall."
+ }
+ ],
+ "discovery_eval_counts": [
+ 0,
+ 4
+ ],
+ "num_full_val_evals": 2,
+ "objective_pareto_front": {
+ "weighted_score": 0.625
+ },
+ "parents": [
+ [
+ null
+ ],
+ [
+ 0
+ ]
+ ],
+ "per_objective_best_candidates": {
+ "weighted_score": [
+ 1
+ ]
+ },
+ "per_val_instance_best_candidates": {
+ "0": [
+ 1
+ ],
+ "1": [
+ 0,
+ 1
+ ]
+ },
+ "run_dir": "prompt-optimizer/out/gepa-e4b-smoke-20260612T184748Z",
+ "seed": 0,
+ "total_metric_calls": 6,
+ "val_aggregate_scores": [
+ 0.25,
+ 0.625
+ ],
+ "val_aggregate_subscores": [
+ {
+ "weighted_score": 0.25
+ },
+ {
+ "weighted_score": 0.625
+ }
+ ],
+ "val_subscores": [
+ {
+ "0": 0.25,
+ "1": 0.25
+ },
+ {
+ "0": 1.0,
+ "1": 0.25
+ }
+ ],
+ "validation_schema_version": 2
+}
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/gepa_state.bin b/gepa-e4b-smoke-20260612T184748Z/gepa_state.bin
new file mode 100644
index 0000000000000000000000000000000000000000..56a4258e0934a6554661b090e6b152eeb2f29a9b
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/gepa_state.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2809aa9f0aeecab28043772208ad826206f7d07ce514a79436dfb2bfc7ee0be4
+size 15472
diff --git a/gepa-e4b-smoke-20260612T184748Z/run_log.json b/gepa-e4b-smoke-20260612T184748Z/run_log.json
new file mode 100644
index 0000000000000000000000000000000000000000..4cc52d48c34f9d160f01d69382c90cb2b02c18ea
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/run_log.json
@@ -0,0 +1,20 @@
+[
+ {
+ "i": 0,
+ "selected_program_candidate": 0,
+ "subsample_ids": [
+ 0
+ ],
+ "subsample_scores": [
+ 0.14285714285714285
+ ],
+ "new_subsample_scores": [
+ 1.0
+ ],
+ "new_program_idx": 1,
+ "evaluated_val_indices": [
+ 0,
+ 1
+ ]
+ }
+]
\ No newline at end of file
diff --git a/gepa-e4b-smoke-20260612T184748Z/run_log.txt b/gepa-e4b-smoke-20260612T184748Z/run_log.txt
new file mode 100644
index 0000000000000000000000000000000000000000..530cea066d212ed8efc462a5197fd631b95ce75f
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/run_log.txt
@@ -0,0 +1,87 @@
+Iteration 0: Base program full valset score: 0.25 over 2 / 2 examples
+Iteration 1: Selected program 0 score: 0.25
+Iteration 1: Proposed new text for routing_policy: Classify a GitHub issue or pull request into the smallest complete set of allowed topic ids.
+
+Input format may include:
+- `target`: GitHub issue or PR URL
+- `title`: strongest evidence
+- optional body, summary, labels, files, comments, or diff context
+- training examples may also include `gold_topics`, prior generated outputs, and feedback; use those only to learn corrections, not as fields for live classification
+
+Output:
+- Call `final_json` with `topics_of_interest`
+- Use only exact allowed topic ids
+- Do not invent aliases such as `cli`, `provider`, `tool`, `thread`, `status`, `openrouter`, or `tts`
+
+Decision process:
+1. Read the title first.
+2. Identify the main user-visible feature, behavior, or failure.
+3. Pick the single best primary topic.
+4. Read only the first clear body summary if needed to disambiguate.
+5. Add a secondary topic only when it is explicitly central enough that omitting it would route the item to the wrong maintainer bucket.
+6. Remove labels based only on implementation details, tests, examples, file paths, side effects, broad impact, or speculative consequences.
+7. Stop once the smallest complete set is clear.
+
+Cardinality:
+- 0 topics if no allowed topic is central.
+- 1 topic for a single-focus item.
+- 2 topics for ordinary cross-topic items.
+- 3 topics only when the title or first clear problem statement explicitly names three central facets.
+- 4+ topics only for explicit multi-system coordination.
+
+High-priority routing cues:
+- Named protocol or adapter: `acp`, `acpx`, `mcp`, gateway protocol, OpenAI-compatible streaming, Responses API, chat completions.
+- Gateway daemon, gateway API, startup, restart, gateway-owned routing, or gateway-owned runtime: `gateway`.
+- Agent runner, backend, embedded runtime, lifecycle, heartbeat, attempt orchestration, node-backed runtime, or runtime state: `agent_runtime`.
+- Subagents, coding-agent runs, harness behavior, approvals, sandboxing, compaction, or agent orchestration: `coding_agents`.
+- Session identity, binding, persistence, transcript, resume, reset, parent/child state: `sessions`.
+- Shell commands, execution control, durable exec, allow/deny controls, spoken/tool output execution: `exec_tools`.
+- Tool invocation semantics, structured tool results, stdout rendering for tool results, pre-tool text preservation, tool-call transcript/content handling: `tool_calling`.
+- Counts, usage, costs, tokens, metrics, diagnostics, traces, status reporting, footer counts: `telemetry_usage`.
+- Base URL normalization, endpoint lifecycle/selection, streaming, request routing, OpenAI-compatible serving, vLLM/TGI/LocalAI endpoint behavior: `model_serving`.
+- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local context, offline/local model compatibility or crashes: `local_models`.
+- Local/self-hosted provider setup, routing, auth, discovery, compatibility: `local_model_providers`.
+- Self-hosted embeddings, memory providers, speech services, private inference services: `self_hosted_inference`.
+- Named Discord, Telegram, Slack, Zulip, Feishu, webchat surfaces: `chat_integrations`.
+- Generic notify policy or delivery gates: `notifications`.
+- Chat UI display/status/footer behavior when the user-facing interface itself is central: `ui_tui`.
+- External API, CLI, or HTTP contracts: `api_surface`.
+- Operational timeout, crash, leak, retry, stuck state, data loss, cleanup, or recovery correctness: `reliability`.
+- Configuration behavior itself: `config`.
+- Docs or testing infrastructure as the subject: `docs` or `tests_ci`.
+
+Important learned correction:
+- For a title like `ACP: add gateway-owned node-backed runtime`, the correct topics are `acp`, `gateway`, and `agent_runtime`.
+- `ACP` is a central protocol topic.
+- `gateway-owned` is central `gateway`, not incidental.
+- `node-backed runtime` is central `agent_runtime`.
+- Do not add `sessions` unless session identity/lifecycle/storage is explicitly central.
+- Do not add `reliability` merely because runtime work could affect robustness; reliability requires an explicit failure, recovery, timeout, crash, leak, data loss, retry, or stuck-state concern.
+
+Hard filters:
+- Delete `local_model_providers` unless the item explicitly centers a local, self-hosted, user-declared OpenAI-compatible backend, localhost/LAN endpoint, or private inference setup.
+- Do not use `local_model_providers` for hosted provider catalogs, hosted model availability, hosted auth, usage/billing UI, TTS/speech/image behavior, or generic hosted APIs.
+- Use `local_models` only for concrete local/offline model execution or compatibility, not static model ids, catalogs, provider manifests, or hosted provider availability.
+- Do not use `api_surface` for internal helpers, status text, UI events, ordinary command behavior, token parsing tests, or local compatibility.
+- Do not use `config` merely because an option exists; classify by what the option controls.
+- Do not use `tests_ci` unless the testing surface itself is the subject.
+- Do not use `docs` unless documentation itself is the subject.
+- Do not use `gateway` merely because traffic passes through the gateway; use it only when gateway ownership, daemon/API behavior, startup/restart, or routing is central.
+
+Optimize for the smallest complete set, not for a single label and not for broad recall.
+Iteration 1: New subsample score 1.0 is better than old score 0.14285714285714285. Continue to full eval and add to candidate pool.
+Iteration 1: Found a better program on the valset with score 0.625.
+Iteration 1: Valset score for new program: 0.625 (coverage 2 / 2)
+Iteration 1: Val aggregate for new program: 0.625
+Iteration 1: Individual valset scores for new program: {0: 1.0, 1: 0.25}
+Iteration 1: Objective aggregate scores for new program: {'weighted_score': 0.625}
+Iteration 1: New valset pareto front scores: {0: 1.0, 1: 0.25}
+Iteration 1: Objective pareto front scores: {'weighted_score': 0.625}
+Iteration 1: Valset pareto front aggregate score: 0.625
+Iteration 1: Updated valset pareto front programs: {0: {1}, 1: {0, 1}}
+Iteration 1: Updated objective pareto front programs: {'weighted_score': {1}}
+Iteration 1: Best valset aggregate score so far: 0.625
+Iteration 1: Best program as per aggregate score on valset: 1
+Iteration 1: Best score on valset: 0.625
+Iteration 1: Linear pareto front program index: 1
+Iteration 1: New program candidate index: 1
diff --git a/gepa-e4b-smoke-20260612T184748Z/run_log_stderr.txt b/gepa-e4b-smoke-20260612T184748Z/run_log_stderr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/gepa-e4b-smoke-20260612T184748Z/summary.json b/gepa-e4b-smoke-20260612T184748Z/summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..4c20e3d4eda2bf85514872211ed12691d20caaba
--- /dev/null
+++ b/gepa-e4b-smoke-20260612T184748Z/summary.json
@@ -0,0 +1,27 @@
+{
+ "best_idx": 1,
+ "best_prompt_path": "prompt-optimizer/out/gepa-e4b-smoke-20260612T184748Z/best.prompt.md",
+ "best_routing_policy_path": "prompt-optimizer/out/gepa-e4b-smoke-20260612T184748Z/best.routing_policy.md",
+ "best_score": 0.625,
+ "config": {
+ "harness": {
+ "base_url": null,
+ "concurrency": 1,
+ "context_window": null,
+ "max_tokens": 1536,
+ "model": "gemma-e4b-reason-test",
+ "state_dir": null,
+ "timeout_ms": 180000
+ },
+ "max_metric_calls": 4,
+ "output_dir": "prompt-optimizer/out/gepa-e4b-smoke-20260612T184748Z",
+ "reflection_minibatch_size": 1,
+ "row_limit": 2,
+ "seed": 0
+ },
+ "created_at": "2026-06-12T18:49:29.728241+00:00",
+ "num_candidates": 2,
+ "num_full_val_evals": 2,
+ "result_path": "prompt-optimizer/out/gepa-e4b-smoke-20260612T184748Z/gepa-result.json",
+ "total_metric_calls": 6
+}
\ No newline at end of file
diff --git a/index.html b/index.html
index b0c4b3666032a737f3903db53e6a8a9272483e28..0a0cb4b1c9a4b6c3ba322d61c31d593c94fcc666 100644
--- a/index.html
+++ b/index.html
@@ -1,19 +1,107 @@
-
-
-
-
- My static Space
-
-
-
-
-
Welcome to your static Space!
-
You can modify this app directly by editing index.html in the Files and versions tab.
The promoted v3 candidate clears the strict gates and keeps mean predicted labels at the v9.1 baseline. The main caveat is scientific, not mechanical: the manual repair used mistakes from this same 60-row set, so a fresh holdout is still needed before treating it as deployment evidence.
+
+
+
diff --git a/latest-final-cardinality-report.txt b/latest-final-cardinality-report.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0a1dcc62d0374fcca238e0735f9bedceec884dab
--- /dev/null
+++ b/latest-final-cardinality-report.txt
@@ -0,0 +1 @@
+final-cardinality-report.html
diff --git a/latest-gepa-multi-run.txt b/latest-gepa-multi-run.txt
new file mode 100644
index 0000000000000000000000000000000000000000..50bebc4712b8df7e7ea5d878a23f98312b80c2d0
--- /dev/null
+++ b/latest-gepa-multi-run.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z
diff --git a/latest-gepa-proper-continuation.txt b/latest-gepa-proper-continuation.txt
new file mode 100644
index 0000000000000000000000000000000000000000..adceb57259cb714ce30a5c5520a78ee702e04fc5
--- /dev/null
+++ b/latest-gepa-proper-continuation.txt
@@ -0,0 +1 @@
+/home/bob/repos/localpager/prompt-optimizer/out/gepa-12b-row30-prop20-continuation-20260614T021448Z
diff --git a/latest-gepa-proper-run.txt b/latest-gepa-proper-run.txt
new file mode 100644
index 0000000000000000000000000000000000000000..50bebc4712b8df7e7ea5d878a23f98312b80c2d0
--- /dev/null
+++ b/latest-gepa-proper-run.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/gepa-12b-proper-from-best-20260613T055906Z
diff --git a/latest-manual-final-json-guard.txt b/latest-manual-final-json-guard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..42cbb86f0dff0e8d51f5e09596c0cdd47b29c92e
--- /dev/null
+++ b/latest-manual-final-json-guard.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/manual-final-json-guard-20260614T085437Z
diff --git a/latest-manual-hardcase-repair.txt b/latest-manual-hardcase-repair.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6aa1ba55810ca5d607f99d49df5965ab3361f96b
--- /dev/null
+++ b/latest-manual-hardcase-repair.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z
diff --git a/latest-targeted-max4096.txt b/latest-targeted-max4096.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ee895a34ad3f0983bedb8254ebadbbcc6dddd308
--- /dev/null
+++ b/latest-targeted-max4096.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/targeted-max4096-20260614T090654Z
diff --git a/latest-template-tail-final-json-guard-v9seed.txt b/latest-template-tail-final-json-guard-v9seed.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cd98632575ded8b2ab3ef6709161e571aa57733b
--- /dev/null
+++ b/latest-template-tail-final-json-guard-v9seed.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/template-tail-final-json-guard-v9seed-20260614T091209Z
diff --git a/latest-template-tail-final-json-guard.txt b/latest-template-tail-final-json-guard.txt
new file mode 100644
index 0000000000000000000000000000000000000000..433621b904a5a561575dcc21b4fbac5a11c6395b
--- /dev/null
+++ b/latest-template-tail-final-json-guard.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/template-tail-final-json-guard-20260614T091121Z
diff --git a/latest-validation-12b-row30-prop16-best.txt b/latest-validation-12b-row30-prop16-best.txt
new file mode 100644
index 0000000000000000000000000000000000000000..36fa37c541feb17d776828c3a186bee5c83cdb03
--- /dev/null
+++ b/latest-validation-12b-row30-prop16-best.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/validation-12b-row30-prop16-best-20260614T081931Z
diff --git a/latest-validation-manual-hardcase-repair-max4096.txt b/latest-validation-manual-hardcase-repair-max4096.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cb7307a4a1a3eeb489821a3c32e37d733be0eea3
--- /dev/null
+++ b/latest-validation-manual-hardcase-repair-max4096.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/validation-manual-hardcase-repair-max4096-20260614T092855Z
diff --git a/latest-validation-manual-hardcase-repair-v2-max4096.txt b/latest-validation-manual-hardcase-repair-v2-max4096.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9e1bce8212df9ce785b702def6ecb9cd16c2af9d
--- /dev/null
+++ b/latest-validation-manual-hardcase-repair-v2-max4096.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/validation-manual-hardcase-repair-v2-max4096-20260614T100359Z
diff --git a/latest-validation-manual-hardcase-repair-v3-cardinality-max4096.txt b/latest-validation-manual-hardcase-repair-v3-cardinality-max4096.txt
new file mode 100644
index 0000000000000000000000000000000000000000..09398ca473b4ae7e9424f37ac5407b5c8eaf57b0
--- /dev/null
+++ b/latest-validation-manual-hardcase-repair-v3-cardinality-max4096.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/validation-manual-hardcase-repair-v3-cardinality-max4096-20260614T103814Z
diff --git a/latest-validation-proper-best.txt b/latest-validation-proper-best.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d1ee6eef52bd5b92fa016c7f2b21a96372edbeae
--- /dev/null
+++ b/latest-validation-proper-best.txt
@@ -0,0 +1 @@
+prompt-optimizer/out/validation-12b-proper-best-20260613T111155Z
diff --git a/manual-final-json-guard-20260614T085437Z/offset11-limit4.json b/manual-final-json-guard-20260614T085437Z/offset11-limit4.json
new file mode 100644
index 0000000000000000000000000000000000000000..59ea1ba206a6605030d75db2badfeab706b320b6
--- /dev/null
+++ b/manual-final-json-guard-20260614T085437Z/offset11-limit4.json
@@ -0,0 +1,99 @@
+{
+ "candidate": "manual-final-json-guard",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 0.75,
+ "offset": 11,
+ "routing_policy_path": "prompt-optimizer/out/manual-final-json-guard-20260614T085437Z/routing_policy.md",
+ "routing_policy_sha256": "1af864eb9ce439ec0498ba5a23eab3b8f637631ac2959c107408ec812f820f63",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ],
+ "id": "openclaw-openclaw-82507",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/82507",
+ "title": "[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)",
+ "true_positives": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ]
+ },
+ {
+ "error": "classifier exit 2: prompt: /home/bob/.local/state/localpager/classifier/prompts/20260614T085458Z-592875.md\nschema: /home/bob/.local/state/localpager/classifier/schemas/20260614T085458Z-592875.json\nsession: /home/bob/.local/state/localpager/classifier/sessions/20260614T085458Z-592875\nlocalpager-agent: final_json was not called; no structured output was captured",
+ "gold_topics": [
+ "self_hosted_inference",
+ "memory"
+ ],
+ "id": "openclaw-openclaw-80479",
+ "predicted_topics": [],
+ "score": 0.0,
+ "target": "https://github.com/openclaw/openclaw/pull/80479",
+ "title": "feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)"
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "local_model_providers",
+ "reliability"
+ ],
+ "id": "openclaw-openclaw-90146",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "local_model_providers",
+ "reliability"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/90146",
+ "title": "google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error",
+ "true_positives": [
+ "local_model_providers",
+ "reliability"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "docs"
+ ],
+ "id": "openclaw-openclaw-51849",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "docs"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/51849",
+ "title": "Docs: add freeCodeCamp OpenClaw full tutorial to showcase",
+ "true_positives": [
+ "docs"
+ ]
+ }
+ ],
+ "rows": 4,
+ "scores": [
+ 1.0,
+ 0.0,
+ 1.0,
+ 1.0
+ ]
+}
diff --git a/manual-final-json-guard-20260614T085437Z/offset11-limit4.stderr.log b/manual-final-json-guard-20260614T085437Z/offset11-limit4.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-final-json-guard-20260614T085437Z/offset11-limit4.summary.json b/manual-final-json-guard-20260614T085437Z/offset11-limit4.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..c2428004b98e23a4e2f827a71a9c6ad3d7daf9b9
--- /dev/null
+++ b/manual-final-json-guard-20260614T085437Z/offset11-limit4.summary.json
@@ -0,0 +1,76 @@
+{
+ "candidate": "manual-final-json-guard",
+ "exact_matches": 3,
+ "false_negatives": 2,
+ "false_positives": 0,
+ "mean_gold_labels": 2.0,
+ "mean_predicted_labels": 1.5,
+ "mean_score": 0.75,
+ "micro_f1": 0.8571428571428571,
+ "over_label_events": 0,
+ "over_label_total": 0,
+ "per_topic": {
+ "acpx": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "codex": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "docs": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "local_model_providers": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "memory": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "reliability": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "self_hosted_inference": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "skills_plugins": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ }
+ },
+ "precision": 1.0,
+ "recall": 0.75,
+ "rows": 4,
+ "source_path": "prompt-optimizer/out/manual-final-json-guard-20260614T085437Z/offset11-limit4.json",
+ "structural_failures": 1,
+ "true_positives": 6
+}
diff --git a/manual-final-json-guard-20260614T085437Z/offset46-limit4.json b/manual-final-json-guard-20260614T085437Z/offset46-limit4.json
new file mode 100644
index 0000000000000000000000000000000000000000..a2dfad881bd14ae464ad5d56918da69c2eab1d37
--- /dev/null
+++ b/manual-final-json-guard-20260614T085437Z/offset46-limit4.json
@@ -0,0 +1,120 @@
+{
+ "candidate": "manual-final-json-guard",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 0.3005952380952381,
+ "offset": 46,
+ "routing_policy_path": "prompt-optimizer/out/manual-final-json-guard-20260614T085437Z/routing_policy.md",
+ "routing_policy_sha256": "1af864eb9ce439ec0498ba5a23eab3b8f637631ac2959c107408ec812f820f63",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [
+ "sessions",
+ "coding_agents"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "queueing",
+ "sessions",
+ "coding_agents"
+ ],
+ "id": "openclaw-openclaw-10467",
+ "loss": 2.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "queueing"
+ ],
+ "score": 0.3333333333333333,
+ "target": "https://github.com/openclaw/openclaw/issues/10467",
+ "title": "[Feature Request]: Multi-lane concurrency support for sub-agents via sessions_spawn",
+ "true_positives": [
+ "queueing"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "tool_calling"
+ ],
+ "false_positives": [
+ "chat_integrations"
+ ],
+ "gold_topics": [
+ "tool_calling",
+ "security"
+ ],
+ "id": "openclaw-openclaw-43246",
+ "loss": 3.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "security",
+ "chat_integrations"
+ ],
+ "score": 0.25,
+ "target": "https://github.com/openclaw/openclaw/pull/43246",
+ "title": "fix(message): deny same-provider cross-context sends by default [AI-assisted]",
+ "true_positives": [
+ "security"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [
+ "gateway"
+ ],
+ "gold_topics": [
+ "sessions",
+ "reliability"
+ ],
+ "id": "openclaw-openclaw-59878",
+ "loss": 2.5,
+ "over_label_count": 1,
+ "predicted_topics": [
+ "sessions",
+ "gateway",
+ "reliability"
+ ],
+ "score": 0.2857142857142857,
+ "target": "https://github.com/openclaw/openclaw/issues/59878",
+ "title": "Session lane stuck in 'running' after run dies \u2014 sessions.abort + gateway restart fail to clear stale state",
+ "true_positives": [
+ "sessions",
+ "reliability"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [
+ "security",
+ "config"
+ ],
+ "false_positives": [],
+ "gold_topics": [
+ "model_serving",
+ "security",
+ "config"
+ ],
+ "id": "openclaw-openclaw-51667",
+ "loss": 2.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "model_serving"
+ ],
+ "score": 0.3333333333333333,
+ "target": "https://github.com/openclaw/openclaw/issues/51667",
+ "title": "Feature: Native Audio Input for Omni-Modal Models (skip STT transcription)",
+ "true_positives": [
+ "model_serving"
+ ]
+ }
+ ],
+ "rows": 4,
+ "scores": [
+ 0.3333333333333333,
+ 0.25,
+ 0.2857142857142857,
+ 0.3333333333333333
+ ]
+}
diff --git a/manual-final-json-guard-20260614T085437Z/offset46-limit4.stderr.log b/manual-final-json-guard-20260614T085437Z/offset46-limit4.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-final-json-guard-20260614T085437Z/offset46-limit4.summary.json b/manual-final-json-guard-20260614T085437Z/offset46-limit4.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..bc5c83246c3af1cbd3aac8eff8283c75c1de8f36
--- /dev/null
+++ b/manual-final-json-guard-20260614T085437Z/offset46-limit4.summary.json
@@ -0,0 +1,90 @@
+{
+ "candidate": "manual-final-json-guard",
+ "exact_matches": 0,
+ "false_negatives": 5,
+ "false_positives": 2,
+ "mean_gold_labels": 2.5,
+ "mean_predicted_labels": 1.75,
+ "mean_score": 0.3005952380952381,
+ "micro_f1": 0.588235294117647,
+ "over_label_events": 1,
+ "over_label_total": 1,
+ "per_topic": {
+ "chat_integrations": {
+ "false_negatives": 0,
+ "false_positives": 1,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "coding_agents": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "config": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "gateway": {
+ "false_negatives": 0,
+ "false_positives": 1,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ },
+ "model_serving": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "queueing": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "reliability": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "security": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 0.5,
+ "true_positives": 1
+ },
+ "sessions": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 0.5,
+ "true_positives": 1
+ },
+ "tool_calling": {
+ "false_negatives": 1,
+ "false_positives": 0,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ }
+ },
+ "precision": 0.7142857142857143,
+ "recall": 0.5,
+ "rows": 4,
+ "source_path": "prompt-optimizer/out/manual-final-json-guard-20260614T085437Z/offset46-limit4.json",
+ "structural_failures": 0,
+ "true_positives": 5
+}
diff --git a/manual-final-json-guard-20260614T085437Z/routing_policy.md b/manual-final-json-guard-20260614T085437Z/routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..9382b51d224f7ee9d53ebb64b8b4e71ad646a52a
--- /dev/null
+++ b/manual-final-json-guard-20260614T085437Z/routing_policy.md
@@ -0,0 +1,153 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+Final output requirement, mandatory:
+- Do not stop after reasoning or prose.
+- If a structured-output tool named `final_json` is available, call `final_json` exactly once before ending.
+- If `final_json` is unavailable, output only `{"topics_of_interest":[...]}` and no other text.
+- The final value must use only exact allowed topic ids and must include no explanation fields.
diff --git a/manual-hardcase-repair-20260614T092140Z/offset11-limit4.json b/manual-hardcase-repair-20260614T092140Z/offset11-limit4.json
new file mode 100644
index 0000000000000000000000000000000000000000..a5489cfeaa3a8fcb9b3b2e7c6e14d2cbeda9094b
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset11-limit4.json
@@ -0,0 +1,110 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 1.0,
+ "offset": 11,
+ "routing_policy_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/routing_policy.md",
+ "routing_policy_sha256": "62ebdf8de7d4b60ac77034ef4d0316561937c6ee35f85b81edc5314482c77e7f",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ],
+ "id": "openclaw-openclaw-82507",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/82507",
+ "title": "[Feature]: ACPX Codex sandbox should inherit user-installed plugins (e.g. Superpowers)",
+ "true_positives": [
+ "acpx",
+ "codex",
+ "skills_plugins"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "self_hosted_inference",
+ "memory"
+ ],
+ "id": "openclaw-openclaw-80479",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "memory",
+ "self_hosted_inference"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/80479",
+ "title": "feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)",
+ "true_positives": [
+ "memory",
+ "self_hosted_inference"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "local_model_providers",
+ "reliability"
+ ],
+ "id": "openclaw-openclaw-90146",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "local_model_providers",
+ "reliability"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/90146",
+ "title": "google-vertex: Missing gemini-3.1-flash-lite in provider catalog causes silent failure instead of error",
+ "true_positives": [
+ "local_model_providers",
+ "reliability"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "docs"
+ ],
+ "id": "openclaw-openclaw-51849",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "docs"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/51849",
+ "title": "Docs: add freeCodeCamp OpenClaw full tutorial to showcase",
+ "true_positives": [
+ "docs"
+ ]
+ }
+ ],
+ "rows": 4,
+ "scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset11-limit4.stderr.log b/manual-hardcase-repair-20260614T092140Z/offset11-limit4.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-hardcase-repair-20260614T092140Z/offset11-limit4.summary.json b/manual-hardcase-repair-20260614T092140Z/offset11-limit4.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..f1da32da26a3beb37b00bdaaf63edce2dc011471
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset11-limit4.summary.json
@@ -0,0 +1,76 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "exact_matches": 4,
+ "false_negatives": 0,
+ "false_positives": 0,
+ "mean_gold_labels": 2.0,
+ "mean_predicted_labels": 2.0,
+ "mean_score": 1.0,
+ "micro_f1": 1.0,
+ "over_label_events": 0,
+ "over_label_total": 0,
+ "per_topic": {
+ "acpx": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "codex": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "docs": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "local_model_providers": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "memory": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "reliability": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "self_hosted_inference": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "skills_plugins": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ }
+ },
+ "precision": 1.0,
+ "recall": 1.0,
+ "rows": 4,
+ "source_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/offset11-limit4.json",
+ "structural_failures": 0,
+ "true_positives": 8
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.json b/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.json
new file mode 100644
index 0000000000000000000000000000000000000000..7046270daa7aa39c589d349013dccb1ff403fb1e
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.json
@@ -0,0 +1,35 @@
+{
+ "candidate": "manual-hardcase-repair-v2-max4096",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 1.0,
+ "offset": 4,
+ "routing_policy_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/routing_policy.md",
+ "routing_policy_sha256": "abf55d5ed85a8f2d887707f4d13017660b9d52938a5a6163e2e519ebc17ee56b",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "model_serving"
+ ],
+ "id": "openclaw-openclaw-79897",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "model_serving"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/79897",
+ "title": "OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)",
+ "true_positives": [
+ "model_serving"
+ ]
+ }
+ ],
+ "rows": 1,
+ "scores": [
+ 1.0
+ ]
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.stderr.log b/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.summary.json b/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..fe5b742595dcb7fd6b00f0059f8f556221326747
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.summary.json
@@ -0,0 +1,27 @@
+{
+ "candidate": "manual-hardcase-repair-v2-max4096",
+ "exact_matches": 1,
+ "false_negatives": 0,
+ "false_positives": 0,
+ "mean_gold_labels": 1.0,
+ "mean_predicted_labels": 1.0,
+ "mean_score": 1.0,
+ "micro_f1": 1.0,
+ "over_label_events": 0,
+ "over_label_total": 0,
+ "per_topic": {
+ "model_serving": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ }
+ },
+ "precision": 1.0,
+ "recall": 1.0,
+ "rows": 1,
+ "source_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/offset4-limit1-v2.json",
+ "structural_failures": 0,
+ "true_positives": 1
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1.json b/manual-hardcase-repair-20260614T092140Z/offset4-limit1.json
new file mode 100644
index 0000000000000000000000000000000000000000..8a66c6ac608682bede6bbb26580cb47040f49b3c
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset4-limit1.json
@@ -0,0 +1,38 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 0.2857142857142857,
+ "offset": 4,
+ "routing_policy_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/routing_policy.md",
+ "routing_policy_sha256": "62ebdf8de7d4b60ac77034ef4d0316561937c6ee35f85b81edc5314482c77e7f",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [
+ "self_hosted_inference"
+ ],
+ "gold_topics": [
+ "model_serving"
+ ],
+ "id": "openclaw-openclaw-79897",
+ "loss": 2.5,
+ "over_label_count": 1,
+ "predicted_topics": [
+ "model_serving",
+ "self_hosted_inference"
+ ],
+ "score": 0.2857142857142857,
+ "target": "https://github.com/openclaw/openclaw/issues/79897",
+ "title": "OpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)",
+ "true_positives": [
+ "model_serving"
+ ]
+ }
+ ],
+ "rows": 1,
+ "scores": [
+ 0.2857142857142857
+ ]
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1.stderr.log b/manual-hardcase-repair-20260614T092140Z/offset4-limit1.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-hardcase-repair-20260614T092140Z/offset4-limit1.summary.json b/manual-hardcase-repair-20260614T092140Z/offset4-limit1.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..175ef4e9509590bf5cfd00af6bef171cec507426
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset4-limit1.summary.json
@@ -0,0 +1,34 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "exact_matches": 0,
+ "false_negatives": 0,
+ "false_positives": 1,
+ "mean_gold_labels": 1.0,
+ "mean_predicted_labels": 2.0,
+ "mean_score": 0.2857142857142857,
+ "micro_f1": 0.6666666666666666,
+ "over_label_events": 1,
+ "over_label_total": 1,
+ "per_topic": {
+ "model_serving": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "self_hosted_inference": {
+ "false_negatives": 0,
+ "false_positives": 1,
+ "precision": 0.0,
+ "recall": 0.0,
+ "true_positives": 0
+ }
+ },
+ "precision": 0.5,
+ "recall": 1.0,
+ "rows": 1,
+ "source_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/offset4-limit1.json",
+ "structural_failures": 0,
+ "true_positives": 1
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset46-limit4.json b/manual-hardcase-repair-20260614T092140Z/offset46-limit4.json
new file mode 100644
index 0000000000000000000000000000000000000000..5db3ba6553ee3e1a00f6d497505f73c2ed9f04c7
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset46-limit4.json
@@ -0,0 +1,116 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "concurrency": 2,
+ "harness": "localpager-agent",
+ "mean_score": 1.0,
+ "offset": 46,
+ "routing_policy_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/routing_policy.md",
+ "routing_policy_sha256": "62ebdf8de7d4b60ac77034ef4d0316561937c6ee35f85b81edc5314482c77e7f",
+ "row_reports": [
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "queueing",
+ "sessions",
+ "coding_agents"
+ ],
+ "id": "openclaw-openclaw-10467",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "queueing",
+ "sessions",
+ "coding_agents"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/10467",
+ "title": "[Feature Request]: Multi-lane concurrency support for sub-agents via sessions_spawn",
+ "true_positives": [
+ "queueing",
+ "sessions",
+ "coding_agents"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "tool_calling",
+ "security"
+ ],
+ "id": "openclaw-openclaw-43246",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "tool_calling",
+ "security"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/pull/43246",
+ "title": "fix(message): deny same-provider cross-context sends by default [AI-assisted]",
+ "true_positives": [
+ "tool_calling",
+ "security"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "sessions",
+ "reliability"
+ ],
+ "id": "openclaw-openclaw-59878",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "sessions",
+ "reliability"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/59878",
+ "title": "Session lane stuck in 'running' after run dies \u2014 sessions.abort + gateway restart fail to clear stale state",
+ "true_positives": [
+ "sessions",
+ "reliability"
+ ]
+ },
+ {
+ "error": null,
+ "false_negatives": [],
+ "false_positives": [],
+ "gold_topics": [
+ "model_serving",
+ "security",
+ "config"
+ ],
+ "id": "openclaw-openclaw-51667",
+ "loss": 0.0,
+ "over_label_count": 0,
+ "predicted_topics": [
+ "model_serving",
+ "security",
+ "config"
+ ],
+ "score": 1.0,
+ "target": "https://github.com/openclaw/openclaw/issues/51667",
+ "title": "Feature: Native Audio Input for Omni-Modal Models (skip STT transcription)",
+ "true_positives": [
+ "model_serving",
+ "security",
+ "config"
+ ]
+ }
+ ],
+ "rows": 4,
+ "scores": [
+ 1.0,
+ 1.0,
+ 1.0,
+ 1.0
+ ]
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/offset46-limit4.stderr.log b/manual-hardcase-repair-20260614T092140Z/offset46-limit4.stderr.log
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/manual-hardcase-repair-20260614T092140Z/offset46-limit4.summary.json b/manual-hardcase-repair-20260614T092140Z/offset46-limit4.summary.json
new file mode 100644
index 0000000000000000000000000000000000000000..8e465aa08780820a373266e18aa66442bbf19f1e
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/offset46-limit4.summary.json
@@ -0,0 +1,76 @@
+{
+ "candidate": "manual-hardcase-repair-max4096",
+ "exact_matches": 4,
+ "false_negatives": 0,
+ "false_positives": 0,
+ "mean_gold_labels": 2.5,
+ "mean_predicted_labels": 2.5,
+ "mean_score": 1.0,
+ "micro_f1": 1.0,
+ "over_label_events": 0,
+ "over_label_total": 0,
+ "per_topic": {
+ "coding_agents": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "config": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "model_serving": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "queueing": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "reliability": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ },
+ "security": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 2
+ },
+ "sessions": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 2
+ },
+ "tool_calling": {
+ "false_negatives": 0,
+ "false_positives": 0,
+ "precision": 1.0,
+ "recall": 1.0,
+ "true_positives": 1
+ }
+ },
+ "precision": 1.0,
+ "recall": 1.0,
+ "rows": 4,
+ "source_path": "prompt-optimizer/out/manual-hardcase-repair-20260614T092140Z/offset46-limit4.json",
+ "structural_failures": 0,
+ "true_positives": 10
+}
diff --git a/manual-hardcase-repair-20260614T092140Z/routing_policy.md b/manual-hardcase-repair-20260614T092140Z/routing_policy.md
new file mode 100644
index 0000000000000000000000000000000000000000..2ac8bd1e39edc94b26be81365f7d355f510fa26c
--- /dev/null
+++ b/manual-hardcase-repair-20260614T092140Z/routing_policy.md
@@ -0,0 +1,177 @@
+You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+If a structured-output tool named `final_json` is available, call it exactly once with:
+{"topics_of_interest":["topic_id"]}
+If no such tool is available, return only this final JSON object and no prose:
+{"topics_of_interest":["topic_id"]}
+
+Allowed topic ids:
+queueing, docs, notifications, sessions, gateway, reliability, memory,
+open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+model_serving, local_models, self_hosted_inference, telemetry_usage,
+exec_tools, sandboxing, browser_automation, cron_automation, config,
+security, mcp_tooling, tool_calling, auth_identity
+
+Task:
+Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+Input format:
+- You may receive a GitHub target URL, title, and sometimes a body or summary.
+- The title is the primary signal.
+- Use the first clear body summary only when the title is ambiguous.
+- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+Process:
+1. Read the title first.
+2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+3. Pick one primary topic.
+4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+6. Use 0 topics when no allowed topic is central.
+7. Never invent topic ids. Never output labels outside the allowed list.
+8. Output JSON only, or use the `final_json` tool if available.
+
+Core suppression rule:
+Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+ACP, ACPX, sessions, approvals:
+- Use `acp` when ACP is named centrally.
+- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
+- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
+- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
+- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
+- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
+- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
+
+Reliability, queueing, and lanes:
+- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
+- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
+- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
+- `self-heal lane wedges` is `reliability`, not `queueing`.
+
+Auth and identity:
+- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
+- OAuth restoration is `auth_identity`.
+- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
+- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
+- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
+
+Codex:
+- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
+- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
+
+Documentation:
+- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
+- Documentation-only PRs should usually include `docs` alone.
+- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
+- Do not add non-allowed or broad demo/showcase labels.
+- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+
+MCP and tool calling:
+- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
+- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
+- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
+
+Open-weight, local provider catalogs, and model serving:
+- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
+- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
+- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
+
+Local models and self-hosted inference:
+- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+- LM Studio is a strong signal for `local_models`.
+- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+
+Notifications and chat integrations:
+- Use `notifications` when notification behavior itself is central.
+- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+Cron:
+- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+Exec, sandboxing, approvals:
+- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+Memory:
+- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+Gateway and sessions:
+- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+- āOutbound session identityā is `sessions`.
+- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+API surface and UI/TUI:
+- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+- If that reader is gateway-backed or gateway-owned, also include `gateway`.
+
+Skills and plugins:
+- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+Gateway and runtime:
+- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+Telemetry and usage:
+- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
+- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+Browser automation:
+- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+Policy, config, security:
+- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+
+Composite titles:
+- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+Final suppression check:
+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
+
+Hard false-positive suppressions and recovery cases:
+- For OpenAI-compatible streaming, final usage chunks, base URL behavior, or protocol lifecycle with llama.cpp/Ollama/vLLM/TGI/LocalAI, use `model_serving` only unless local runtime setup, local model compatibility, or self-hosted provider setup is itself the subject. Do not add `local_models` or `self_hosted_inference` just because a local server name appears.
+- For a session lane stuck in `running` after a run dies, including `sessions.abort`, gateway restart, and stale state cleanup, use `sessions` and `reliability`. Do not add `gateway` unless gateway API, gateway routing, or gateway ownership is the subject.
+- Same-provider or cross-context send denial/default-deny message policy is `tool_calling` plus `security`. Do not add `chat_integrations` unless a named chat platform is central.
+- Multi-lane concurrency support for sub-agents via `sessions_spawn` is `queueing`, `sessions`, and `coding_agents`.
+- Native audio input for omni-modal models, especially when skipping STT transcription, is `model_serving`, `security`, and `config`.
+- Model, network, and MCP conformance checks are `config`, `security`, and `mcp_tooling`; do not add `local_model_providers` for generic model conformance policy.
+- Layered browser diagnostics are `browser_automation`, not `ui_tui`.
+- Cron announce or heartbeat target-channel Slack pushes are `notifications` plus `chat_integrations`; do not add `cron_automation` unless scheduling or cron lifecycle is the subject.
+- Stale token-count freshness such as `totalTokensFresh` is `sessions` plus `telemetry_usage`, not `ui_tui`.
+- Security vulnerability batches in skill scanner, SSRF, hook priority, and token verification are `security`, `hooks`, and `skills_plugins`; do not add `auth_identity` or `local_model_providers` unless login/provider catalog behavior is central.
+- Local Ollama embeddings blocked by proxy or NO_PROXY are `local_models` plus `self_hosted_inference`; do not add `security` or `memory` unless credential/security policy or memory behavior is the subject.
+- ACP configured binding that uses a parent channel ID for session key and causes shared persistent process identity is `acp` plus `sessions`, not `acpx`.
+- Explicit ACP agent runs with embedded backend visibility failure and stale final JSON state after `sessions_yield` are `acp` plus `acpx`, not `sessions`.
+- Duplicate ACP completion delivery through `CompletionDeliveryGate` is `acp`, `coding_agents`, and `reliability`, not `sessions`.
+- Plugin runtime features that expose ACP spawn or prompts are `acp` plus `coding_agents`, not `skills_plugins` unless plugin installation/discovery/availability is central.
+- Runtime recovery for heartbeat, Feishu, and exec sessions is `reliability`, `cron_automation`, and `exec_tools`; do not add `chat_integrations` merely because Feishu is named.
+- ACP/Codex child tasks that can be marked succeeded with progress-only output and no final deliverable are `acp`, `codex`, and `agent_runtime`, not `sessions`.
+- Runtime auth override APIs such as `registerProviderRuntimeAuthOverride` are `auth_identity` plus `api_surface`, not `skills_plugins`.
+- Local memory embeddings on Apple Silicon crashing gateway in ggml-metal or node-llama-cpp are `local_models`, `memory`, and `self_hosted_inference`; do not add `reliability` merely because the title says crash.
+- Tool-use/tool-result mismatch repair is `tool_calling`, `coding_agents`, and `reliability`, not `security`.
+- SSE-backed MCP sessions stale after server restart with `Session not found` are `mcp_tooling`, `sessions`, and `gateway`, not `reliability`.
+
+Strict cardinality budget:
+- Return at most two topic ids. This cap is higher priority than the examples above.
+- If three or more topics seem relevant, keep only the two most central maintainer-owned buckets.
+- Prefer concrete subsystem buckets over generic helpers. Drop `reliability`, `security`, `skills_plugins`, `auth_identity`, `gateway`, `acpx`, `ui_tui`, or `local_model_providers` when one of them is only a contextual third label.
+- Do not add a third label to acknowledge every noun in a composite title. The route should be sparse: two labels maximum, one label when one owner clearly dominates.
diff --git a/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.prompt.diff b/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.prompt.diff
new file mode 100644
index 0000000000000000000000000000000000000000..e9b3080f73fcc3c6b0ce95f93b8a1517663eb1a5
--- /dev/null
+++ b/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.prompt.diff
@@ -0,0 +1,232 @@
+--- 00-v9-1-seed.prompt.md
++++ 01-gepa-six.prompt.md
+@@ -34,196 +34,55 @@
+
+ Topic definitions and cue words:
+
+ __TOPIC_DESCRIPTIONS__
+
+-## Goal
++You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-Choose the smallest topic set that would route this item to the right
+-maintainer interest bucket without hiding a central second concern.
++This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+-This is a fuzzy multi-label task. Avoid both failure modes:
++Process:
+
+-- Over-labeling: adding topics for implementation details, tests, examples, or
+- side effects.
+-- Under-labeling: dropping a topic that is explicitly central to the title or
+- first clear problem statement.
++1. Read the title first.
++2. Identify the main user-visible problem, feature, or policy change.
++3. Pick one primary topic.
++4. Read only the first clear body summary if needed to disambiguate.
++5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
++6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
++7. Return only exact allowed topic ids.
+
+-## Decision Process
++Do not over-label from keywords.
+
+-Do these checks internally, then call `final_json`.
++Important domain rules:
+
+-1. Read the title first. Identify the main user-visible problem or feature.
+-2. Pick the single best primary topic.
+-3. Scan the title and first clear body summary for explicit central facets:
+- protocol/runtime, provider/model serving, session/state, execution/tooling,
+- channel/UI, memory, security/auth, config, reliability, docs/tests.
+-4. Add a secondary topic only when removing it would misroute a maintainer who
+- follows that bucket.
+-5. Delete topics that are only file locations, validation work, examples,
+- dependencies, likely consequences, or broad words from comments.
+-6. Do a final cardinality check:
+- - 0 topics when no allowed topic is central.
+- - 1 topic for a single-focus item.
+- - 2 topics for normal cross-topic items.
+- - 3 topics when the title/body explicitly has three central facets.
+- - 4+ topics only for explicit multi-system coordination, not broad impact.
++- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
++- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
++- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
++- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+-Do not optimize for one label. Optimize for the smallest complete set.
++Policy/config rules:
+
+-## Runtime Discipline
++- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
++- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
++- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
++- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
++- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+-This is a routing task, not a full PR review.
++Cardinality guidance:
+
+-- Do not audit every allowed topic.
+-- Do not summarize the whole issue, comments, or diff.
+-- Do not inspect repository files when the title and GitHub context are enough.
+-- Stop once the primary topic and any explicit central secondary facets are
+- identified.
+-- Call `final_json` immediately after the final cardinality check.
++- Use 0 topics when no allowed topic is central.
++- Use 1 topic for a single-focus item.
++- Use 2 topics for normal cross-topic items.
++- Use 3 topics only when the title or first clear summary explicitly has three central facets.
++- Use 4+ topics only for explicit multi-system coordination.
+
+-## Evidence Priority
++Final suppression checks before output:
+
+-- Title is strongest.
+-- The first clear problem statement or summary in the body is next.
+-- Labels, changed files, comments, and diff can confirm or disambiguate, but
+- they must not broaden the topic set by themselves.
+-- Tests and docs files are weak evidence unless tests or docs are the subject.
+-- If context is truncated, classify from the visible central evidence and add a
+- caveat only when that materially limits confidence.
+-
+-## Central Facet Recovery
+-
+-Before final output, check whether any of these central facets were dropped:
+-
+-- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway
+- protocol, OpenAI-compatible streaming, Responses API, chat completions.
+-- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration
+- behavior can be central as `agent_runtime` or `coding_agents`.
+-- Session identity, binding, persistence, transcript, resume, reset, or
+- parent/child state can be central as `sessions`.
+-- Execution control, shell commands, tool allow/deny, durable exec, or tool
+- invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or
+- `tool_calling` depending on the exact behavior.
+-- Local/self-hosted model backends, local hardware, OpenAI-compatible local
+- serving, embeddings providers, and model endpoint behavior can be central as
+- `local_models`, `local_model_providers`, `model_serving`, or
+- `self_hosted_inference`.
+-- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,
+- notifications, and telemetry can be central when the title or summary makes
+- them the subject.
+-
+-If a facet is central but maps to a narrower allowed topic, use the narrower
+-topic instead of a broad fallback.
+-
+-## Enum Discipline
+-
+-- Output only exact allowed topic ids.
+-- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,
+- `thread`, `provider`, `tool`, or `test`.
+-- If the closest word in the title is not allowed, map it to the nearest
+- allowed topic or omit it.
+-
+-## Positive Cues
+-
+-- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status
+- reporting route to `telemetry_usage`.
+-- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,
+- compaction, or agent orchestration route to `coding_agents`.
+-- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,
+- or runtime state route to `agent_runtime` when the agent runtime itself is
+- central.
+-- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,
+- local fallback, and local context behavior route to `local_models`.
+-- OpenAI-compatible serving, base URL normalization for model endpoints,
+- streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and
+- request routing route to `model_serving`.
+-- Self-hosted embeddings, memory providers, speech services, private inference,
+- and locally operated inference services route to `self_hosted_inference`.
+-- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces
+- route to `chat_integrations`; generic notify policy/delivery gates route to
+- `notifications`.
+-- Chat UI display/status/footer behavior routes to `ui_tui` only when the
+- user-facing interface is central.
+-- Test-only contract follow-through routes to `tests_ci` only when the testing
+- surface itself is the subject; otherwise route by the runtime contract being
+- tested.
+-
+-## Hard Local Model Filters
+-
+-Before final output:
+-
+-- Delete `local_model_providers` unless the item explicitly centers a local,
+- self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,
+- vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private
+- inference.
+-- Never use `local_model_providers` for hosted provider catalogs/manifests,
+- hosted model availability, hosted auth, usage/billing UI, TTS/speech/image
+- behavior, or hosted APIs. Hosted catalog updates are `model_releases` and
+- sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or
+- `ui_tui`.
+-- Use `local_models` only for concrete local/offline model execution or
+- compatibility; not model IDs, catalogs, lists, static entries, provider
+- manifests, or hosted provider availability. Keep it for local-model lean
+- filtering and local-model runtime crashes.
+-
+-## Over-Label Guardrails
+-
+-- `api_surface`: external API, CLI, or HTTP contracts only. Not internal
+- payloads/options/functions, status text, UI events, or ordinary command
+- behavior.
+-- `reliability`: operational failures such as timeout, crash, leak, retry,
+- stuck state, data loss, cleanup, or recovery. Not a generic bug tag.
+-- `sessions`: session lifecycle/state/storage/identity only. Not every item
+- mentioning session context or files.
+-- `local_model_providers`: provider setup/routing/auth/discovery/compatibility
+- only. Not every local endpoint issue.
+-- `config`: configuration behavior itself, not any feature with an option.
+-- `docs` and `tests_ci`: only when docs or test tooling is the subject.
+-- `security`: security boundary, credentials, auth hardening, SSRF, token
+- handling, or vulnerability only. Not every risky-looking behavior.
+-- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned
+- routing only. Not every item that passes through the gateway.
+-
+-## Tie-Breakers
+-
+-- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features
+- are `telemetry_usage`, even if shown in UI or session status.
+-- Base URL normalization, endpoint lifecycle/selection, streaming, request
+- routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model
+- endpoint compatibility are `model_serving`.
+-- TTS, shell/exec, command, tool invocation, allowlist, and execution-control
+- behavior are `exec_tools` when the feature controls execution or spoken/tool
+- output.
+-- Thread/session isolation, per-session binding, fallback recovery state, and
+- lifecycle state are `sessions` when those boundaries are central.
+-- Structured tool result display, stdout rendering for tool results, pre-tool
+- text preservation, and tool-call transcript/content handling are
+- `tool_calling` when tool-call semantics are central.
+-- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate
+- cleanup, and lifecycle recovery are `reliability` when recovery correctness is
+- central.
+-
+-## False-Positive Suppression
+-
+-- Do not use `local_model_providers` for base URL normalization,
+- OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,
+- usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider
+- setup/auth/discovery/routing is central.
+-- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu
+- behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or
+- outbound recovery correctness. Use `chat_integrations` for named chat surfaces
+- and `reliability` for recovery/loss/fallback.
+-- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,
+- generic tool output, or config-like options.
+-- Do not use `api_surface` for parse helpers, CLI edge-case tests, token
+- parsing, status/footer display, internal command behavior, or local model
+- compatibility.
+-- Do not use `config` merely because a feature adds an option. Route by what
+- the option controls.
+-
+-## Target
++- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
++- Prefer the narrower central topic over a broad fallback.
++- Never invent topic ids.
++- Output only the final JSON with the selected topic ids.## Target
+
+ `__TARGET__`
+
+ ## GitHub Context
+
diff --git a/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.routing_policy.diff b/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.routing_policy.diff
new file mode 100644
index 0000000000000000000000000000000000000000..ef162e0af9f58e151957c7f132241f64470f1c26
--- /dev/null
+++ b/prompt-diffs/diffs/00-v9-1-seed__to__01-gepa-six.routing_policy.diff
@@ -0,0 +1,221 @@
+--- 00-v9-1-seed.routing_policy.md
++++ 01-gepa-six.routing_policy.md
+@@ -1,185 +1,45 @@
+-## Goal
++You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-Choose the smallest topic set that would route this item to the right
+-maintainer interest bucket without hiding a central second concern.
++This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
+
+-This is a fuzzy multi-label task. Avoid both failure modes:
++Process:
+
+-- Over-labeling: adding topics for implementation details, tests, examples, or
+- side effects.
+-- Under-labeling: dropping a topic that is explicitly central to the title or
+- first clear problem statement.
++1. Read the title first.
++2. Identify the main user-visible problem, feature, or policy change.
++3. Pick one primary topic.
++4. Read only the first clear body summary if needed to disambiguate.
++5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
++6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
++7. Return only exact allowed topic ids.
+
+-## Decision Process
++Do not over-label from keywords.
+
+-Do these checks internally, then call `final_json`.
++Important domain rules:
+
+-1. Read the title first. Identify the main user-visible problem or feature.
+-2. Pick the single best primary topic.
+-3. Scan the title and first clear body summary for explicit central facets:
+- protocol/runtime, provider/model serving, session/state, execution/tooling,
+- channel/UI, memory, security/auth, config, reliability, docs/tests.
+-4. Add a secondary topic only when removing it would misroute a maintainer who
+- follows that bucket.
+-5. Delete topics that are only file locations, validation work, examples,
+- dependencies, likely consequences, or broad words from comments.
+-6. Do a final cardinality check:
+- - 0 topics when no allowed topic is central.
+- - 1 topic for a single-focus item.
+- - 2 topics for normal cross-topic items.
+- - 3 topics when the title/body explicitly has three central facets.
+- - 4+ topics only for explicit multi-system coordination, not broad impact.
++- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
++- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
++- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
++- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+
+-Do not optimize for one label. Optimize for the smallest complete set.
++Policy/config rules:
+
+-## Runtime Discipline
++- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
++- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
++- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
++- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
++- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+
+-This is a routing task, not a full PR review.
++Cardinality guidance:
+
+-- Do not audit every allowed topic.
+-- Do not summarize the whole issue, comments, or diff.
+-- Do not inspect repository files when the title and GitHub context are enough.
+-- Stop once the primary topic and any explicit central secondary facets are
+- identified.
+-- Call `final_json` immediately after the final cardinality check.
++- Use 0 topics when no allowed topic is central.
++- Use 1 topic for a single-focus item.
++- Use 2 topics for normal cross-topic items.
++- Use 3 topics only when the title or first clear summary explicitly has three central facets.
++- Use 4+ topics only for explicit multi-system coordination.
+
+-## Evidence Priority
++Final suppression checks before output:
+
+-- Title is strongest.
+-- The first clear problem statement or summary in the body is next.
+-- Labels, changed files, comments, and diff can confirm or disambiguate, but
+- they must not broaden the topic set by themselves.
+-- Tests and docs files are weak evidence unless tests or docs are the subject.
+-- If context is truncated, classify from the visible central evidence and add a
+- caveat only when that materially limits confidence.
+-
+-## Central Facet Recovery
+-
+-Before final output, check whether any of these central facets were dropped:
+-
+-- A named protocol or protocol adapter can be central: ACP, ACPX, MCP, gateway
+- protocol, OpenAI-compatible streaming, Responses API, chat completions.
+-- Agent runner, backend, subagent, harness, heartbeat, lifecycle, or orchestration
+- behavior can be central as `agent_runtime` or `coding_agents`.
+-- Session identity, binding, persistence, transcript, resume, reset, or
+- parent/child state can be central as `sessions`.
+-- Execution control, shell commands, tool allow/deny, durable exec, or tool
+- invocation can be central as `exec_tools`, `approvals`, `sandboxing`, or
+- `tool_calling` depending on the exact behavior.
+-- Local/self-hosted model backends, local hardware, OpenAI-compatible local
+- serving, embeddings providers, and model endpoint behavior can be central as
+- `local_models`, `local_model_providers`, `model_serving`, or
+- `self_hosted_inference`.
+-- Named chat surfaces, browser/web UI, memory, security/auth, config, queueing,
+- notifications, and telemetry can be central when the title or summary makes
+- them the subject.
+-
+-If a facet is central but maps to a narrower allowed topic, use the narrower
+-topic instead of a broad fallback.
+-
+-## Enum Discipline
+-
+-- Output only exact allowed topic ids.
+-- Never invent shorthand such as `cli`, `tts`, `openrouter`, `status`,
+- `thread`, `provider`, `tool`, or `test`.
+-- If the closest word in the title is not allowed, map it to the nearest
+- allowed topic or omit it.
+-
+-## Positive Cues
+-
+-- Counts, usage, cost, tokens, metrics, diagnostics, traces, and status
+- reporting route to `telemetry_usage`.
+-- Subagents, coding-agent runs, harness behavior, approvals, sandboxing,
+- compaction, or agent orchestration route to `coding_agents`.
+-- Agent runner, embedded backend, lifecycle, heartbeat, attempt, orchestration,
+- or runtime state route to `agent_runtime` when the agent runtime itself is
+- central.
+-- LM Studio, Ollama, llama.cpp, GGUF, local hardware, local model compatibility,
+- local fallback, and local context behavior route to `local_models`.
+-- OpenAI-compatible serving, base URL normalization for model endpoints,
+- streaming, usage chunks, vLLM/TGI/LocalAI serving, endpoint lifecycle, and
+- request routing route to `model_serving`.
+-- Self-hosted embeddings, memory providers, speech services, private inference,
+- and locally operated inference services route to `self_hosted_inference`.
+-- Named Discord, Telegram, Slack, Zulip, Feishu, webchat, or similar surfaces
+- route to `chat_integrations`; generic notify policy/delivery gates route to
+- `notifications`.
+-- Chat UI display/status/footer behavior routes to `ui_tui` only when the
+- user-facing interface is central.
+-- Test-only contract follow-through routes to `tests_ci` only when the testing
+- surface itself is the subject; otherwise route by the runtime contract being
+- tested.
+-
+-## Hard Local Model Filters
+-
+-Before final output:
+-
+-- Delete `local_model_providers` unless the item explicitly centers a local,
+- self-hosted, or user-declared OpenAI-compatible backend: LM Studio, Ollama,
+- vLLM, TGI, LocalAI, llama.cpp, Atomic Chat, localhost/LAN, or private
+- inference.
+-- Never use `local_model_providers` for hosted provider catalogs/manifests,
+- hosted model availability, hosted auth, usage/billing UI, TTS/speech/image
+- behavior, or hosted APIs. Hosted catalog updates are `model_releases` and
+- sometimes `config`; hosted usage/billing/status work is `telemetry_usage` or
+- `ui_tui`.
+-- Use `local_models` only for concrete local/offline model execution or
+- compatibility; not model IDs, catalogs, lists, static entries, provider
+- manifests, or hosted provider availability. Keep it for local-model lean
+- filtering and local-model runtime crashes.
+-
+-## Over-Label Guardrails
+-
+-- `api_surface`: external API, CLI, or HTTP contracts only. Not internal
+- payloads/options/functions, status text, UI events, or ordinary command
+- behavior.
+-- `reliability`: operational failures such as timeout, crash, leak, retry,
+- stuck state, data loss, cleanup, or recovery. Not a generic bug tag.
+-- `sessions`: session lifecycle/state/storage/identity only. Not every item
+- mentioning session context or files.
+-- `local_model_providers`: provider setup/routing/auth/discovery/compatibility
+- only. Not every local endpoint issue.
+-- `config`: configuration behavior itself, not any feature with an option.
+-- `docs` and `tests_ci`: only when docs or test tooling is the subject.
+-- `security`: security boundary, credentials, auth hardening, SSRF, token
+- handling, or vulnerability only. Not every risky-looking behavior.
+-- `gateway`: gateway daemon, gateway API, startup, restart, or gateway-owned
+- routing only. Not every item that passes through the gateway.
+-
+-## Tie-Breakers
+-
+-- Count/usage/token/cost/metric/trace/diagnostic/status/footer-count features
+- are `telemetry_usage`, even if shown in UI or session status.
+-- Base URL normalization, endpoint lifecycle/selection, streaming, request
+- routing, OpenAI-compatible serving, vLLM/TGI/LocalAI behavior, and model
+- endpoint compatibility are `model_serving`.
+-- TTS, shell/exec, command, tool invocation, allowlist, and execution-control
+- behavior are `exec_tools` when the feature controls execution or spoken/tool
+- output.
+-- Thread/session isolation, per-session binding, fallback recovery state, and
+- lifecycle state are `sessions` when those boundaries are central.
+-- Structured tool result display, stdout rendering for tool results, pre-tool
+- text preservation, and tool-call transcript/content handling are
+- `tool_calling` when tool-call semantics are central.
+-- Delivery fallback, outbound recovery, lost final/pre-tool text, duplicate
+- cleanup, and lifecycle recovery are `reliability` when recovery correctness is
+- central.
+-
+-## False-Positive Suppression
+-
+-- Do not use `local_model_providers` for base URL normalization,
+- OpenRouter/OpenAI-compatible endpoint fixes, endpoint lifecycle, streaming,
+- usage chunks, or vLLM/TGI/LocalAI serving. Use `model_serving` unless provider
+- setup/auth/discovery/routing is central.
+-- Do not use `notifications` for named Discord/Telegram/Slack/Zulip/Feishu
+- behavior, ACP final/pre-tool text preservation, delivery fallback recovery, or
+- outbound recovery correctness. Use `chat_integrations` for named chat surfaces
+- and `reliability` for recovery/loss/fallback.
+-- Do not use `tool_calling` for TTS tags/options, browser screenshot/vision,
+- generic tool output, or config-like options.
+-- Do not use `api_surface` for parse helpers, CLI edge-case tests, token
+- parsing, status/footer display, internal command behavior, or local model
+- compatibility.
+-- Do not use `config` merely because a feature adds an option. Route by what
+- the option controls.
+-
++- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
++- Prefer the narrower central topic over a broad fallback.
++- Never invent topic ids.
++- Output only the final JSON with the selected topic ids.
\ No newline at end of file
diff --git a/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.prompt.diff b/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.prompt.diff
new file mode 100644
index 0000000000000000000000000000000000000000..94570e0e8217c4e72b1f0e29fb3c80a13adee5a8
--- /dev/null
+++ b/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.prompt.diff
@@ -0,0 +1,201 @@
+--- 01-gepa-six.prompt.md
++++ 02-previous-proper-best.prompt.md
+@@ -36,53 +36,159 @@
+
+ __TOPIC_DESCRIPTIONS__
+
+ You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
++Return only final JSON:
++{"topics_of_interest":["topic_id"]}
++
++Allowed topic ids:
++queueing, docs, notifications, sessions, gateway, reliability, memory,
++open_weight_models, local_model_providers, codex, api_surface, ui_tui,
++chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
++model_serving, local_models, self_hosted_inference, telemetry_usage,
++exec_tools, sandboxing, browser_automation, cron_automation, config,
++security, mcp_tooling, tool_calling
++
++Task:
++Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
++
++Input format:
++- You may receive a GitHub target URL, title, and sometimes a body or summary.
++- The title is the primary signal.
++- Use the first clear body summary only when the title is ambiguous.
++- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+ Process:
+-
+ 1. Read the title first.
+-2. Identify the main user-visible problem, feature, or policy change.
++2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+ 3. Pick one primary topic.
+-4. Read only the first clear body summary if needed to disambiguate.
+-5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+-6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+-7. Return only exact allowed topic ids.
+-
+-Do not over-label from keywords.
+-
+-Important domain rules:
+-
+-- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+-- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+-- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+-- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
+-
+-Policy/config rules:
+-
+-- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+-- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+-- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+-- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+-- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
+-
+-Cardinality guidance:
+-
+-- Use 0 topics when no allowed topic is central.
+-- Use 1 topic for a single-focus item.
+-- Use 2 topics for normal cross-topic items.
+-- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+-- Use 4+ topics only for explicit multi-system coordination.
+-
+-Final suppression checks before output:
+-
+-- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+-- Prefer the narrower central topic over a broad fallback.
+-- Never invent topic ids.
+-- Output only the final JSON with the selected topic ids.## Target
++4. Add secondary topics only when they are explicit central maintainer-owned subjects.
++5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
++6. Use 0 topics when no allowed topic is central.
++7. Never invent topic ids.
++8. Output JSON only.
++
++Core suppression rule:
++Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
++
++Notifications:
++- Use `notifications` when notification behavior itself is central.
++- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
++- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
++- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
++
++Chat integrations:
++- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
++- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
++- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
++
++Cron:
++- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
++- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
++
++Exec, sandboxing, approvals:
++- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
++- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
++ - `exec_tools`
++ - `sandboxing`
++ - `approvals`
++- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
++- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
++
++Reliability:
++- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
++- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
++- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
++
++Memory:
++- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
++- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
++
++Gateway and sessions:
++- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
++- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
++- āOutbound session identityā is `sessions`.
++- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
++
++Open-weight and local model providers:
++- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
++- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
++- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
++
++Codex:
++- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
++- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
++- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
++
++API surface and UI/TUI:
++- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
++- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
++- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
++- If that reader is gateway-backed or gateway-owned, also include `gateway`.
++
++Skills and plugins:
++- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
++- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
++
++ACP, ACPX, approvals:
++- Use `acp` when ACP is named centrally.
++- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
++- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
++- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
++
++Gateway and runtime:
++- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
++- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
++
++Model serving:
++- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
++- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
++- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
++
++Local models and self-hosted inference:
++- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
++- LM Studio is a strong signal for `local_models`.
++- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
++- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
++
++Telemetry and usage:
++- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
++
++Browser automation:
++- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
++- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
++
++Policy, config, security, MCP:
++- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
++- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
++- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
++- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
++
++Documentation:
++- Documentation-only PRs should usually include `docs` plus the central documented area.
++- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
++- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
++
++Queueing:
++- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
++- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
++
++Tool calling:
++- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
++- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
++
++Composite titles:
++- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
++- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
++- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
++- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
++
++Final suppression check:
++Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
++
++Output only JSON.## Target
+
+ `__TARGET__`
+
+ ## GitHub Context
+
diff --git a/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.routing_policy.diff b/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.routing_policy.diff
new file mode 100644
index 0000000000000000000000000000000000000000..7bd1684f9d1e1fbc695da7e161f440941e62058e
--- /dev/null
+++ b/prompt-diffs/diffs/01-gepa-six__to__02-previous-proper-best.routing_policy.diff
@@ -0,0 +1,184 @@
+--- 01-gepa-six.routing_policy.md
++++ 02-previous-proper-best.routing_policy.md
+@@ -1,45 +1,151 @@
+ You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-This is a fuzzy multi-label routing task. Your goal is not to mention every related area. Your goal is to choose the minimum topic set that sends the item to the right maintainer bucket without dropping an explicit central second concern.
++Return only final JSON:
++{"topics_of_interest":["topic_id"]}
++
++Allowed topic ids:
++queueing, docs, notifications, sessions, gateway, reliability, memory,
++open_weight_models, local_model_providers, codex, api_surface, ui_tui,
++chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
++model_serving, local_models, self_hosted_inference, telemetry_usage,
++exec_tools, sandboxing, browser_automation, cron_automation, config,
++security, mcp_tooling, tool_calling
++
++Task:
++Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
++
++Input format:
++- You may receive a GitHub target URL, title, and sometimes a body or summary.
++- The title is the primary signal.
++- Use the first clear body summary only when the title is ambiguous.
++- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+ Process:
++1. Read the title first.
++2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
++3. Pick one primary topic.
++4. Add secondary topics only when they are explicit central maintainer-owned subjects.
++5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
++6. Use 0 topics when no allowed topic is central.
++7. Never invent topic ids.
++8. Output JSON only.
+
+-1. Read the title first.
+-2. Identify the main user-visible problem, feature, or policy change.
+-3. Pick one primary topic.
+-4. Read only the first clear body summary if needed to disambiguate.
+-5. Add a secondary topic only when it is explicitly central and removing it would route the item away from a maintainer who must see it.
+-6. Remove topics that come only from symptoms, implementation details, tests, examples, files changed, broad impact, or incidental words.
+-7. Return only exact allowed topic ids.
++Core suppression rule:
++Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+-Do not over-label from keywords.
++Notifications:
++- Use `notifications` when notification behavior itself is central.
++- Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
++- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
++- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+-Important domain rules:
++Chat integrations:
++- Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
++- Slack target-channel pushes and WhatsApp history are `chat_integrations`.
++- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
+
+-- OpenAI-compatible streaming, final usage chunks, stream lifecycle, endpoint compatibility, base URL behavior, vLLM/TGI/LocalAI/llama.cpp serving behavior, and request routing are `model_serving`.
+-- Do not add `telemetry_usage` merely because the title mentions usage, tokens, counts, cost, or chunks when those are symptoms of a model-serving protocol bug.
+-- Example: āOpenAI-compatible streaming with llama.cpp saves zero usage (stream closed before final usage chunk)ā is only `model_serving`. The central issue is the OpenAI-compatible streaming/final usage chunk behavior, not telemetry reporting.
+-- Use `telemetry_usage` only when the metric, usage accounting/reporting, cost display, diagnostic count, trace, or status reporting surface is itself the feature or bug.
++Cron:
++- Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
++- Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+-Policy/config rules:
++Exec, sandboxing, approvals:
++- Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
++- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
++ - `exec_tools`
++ - `sandboxing`
++ - `approvals`
++- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
++- Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+
+-- Items about policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement usually include `config` when the policy/checking behavior is central.
+-- Do not map the word āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+-- Network policy, network conformance, access restrictions, outbound rules, or boundary checks can be `security` when they concern allowed/blocked network behavior.
+-- MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks route to `mcp_tooling`.
+-- Example: āPolicy: add model, network, and MCP conformance checksā should be `mcp_tooling`, `config`, and `security`, not `model_serving`.
++Reliability:
++- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
++- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
++- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+-Cardinality guidance:
++Memory:
++- Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
++- Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+-- Use 0 topics when no allowed topic is central.
+-- Use 1 topic for a single-focus item.
+-- Use 2 topics for normal cross-topic items.
+-- Use 3 topics only when the title or first clear summary explicitly has three central facets.
+-- Use 4+ topics only for explicit multi-system coordination.
++Gateway and sessions:
++- Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
++- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
++- āOutbound session identityā is `sessions`.
++- A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+
+-Final suppression checks before output:
++Open-weight and local model providers:
++- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
++- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
++- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+
+-- If a topic was added only because of a word like āusageā, āmodelā, ānetworkā, ātestā, āpolicyā, āstatusā, or āchunkā, verify that the topic is actually the subject, not just context.
+-- Prefer the narrower central topic over a broad fallback.
+-- Never invent topic ids.
+-- Output only the final JSON with the selected topic ids.+Codex:
++- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
++- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
++- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
++
++API surface and UI/TUI:
++- Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
++- Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
++- Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
++- If that reader is gateway-backed or gateway-owned, also include `gateway`.
++
++Skills and plugins:
++- Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
++- Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
++
++ACP, ACPX, approvals:
++- Use `acp` when ACP is named centrally.
++- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
++- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
++- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
++
++Gateway and runtime:
++- Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
++- `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
++
++Model serving:
++- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
++- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
++- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
++
++Local models and self-hosted inference:
++- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
++- LM Studio is a strong signal for `local_models`.
++- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
++- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
++
++Telemetry and usage:
++- Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
++
++Browser automation:
++- Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
++- Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
++
++Policy, config, security, MCP:
++- Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
++- Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
++- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
++- Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
++
++Documentation:
++- Documentation-only PRs should usually include `docs` plus the central documented area.
++- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
++- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
++
++Queueing:
++- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
++- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
++
++Tool calling:
++- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
++- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
++
++Composite titles:
++- If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
++- Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
++- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
++- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
++
++Final suppression check:
++Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
++
++Output only JSON.
\ No newline at end of file
diff --git a/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.prompt.diff b/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.prompt.diff
new file mode 100644
index 0000000000000000000000000000000000000000..055226ff159bfb2f71683d586424324eb77484fe
--- /dev/null
+++ b/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.prompt.diff
@@ -0,0 +1,228 @@
+--- 02-previous-proper-best.prompt.md
++++ 03-prop20-best.prompt.md
+@@ -36,91 +36,120 @@
+
+ __TOPIC_DESCRIPTIONS__
+
+ You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-Return only final JSON:
++If a structured-output tool named `final_json` is available, call it exactly once with:
++{"topics_of_interest":["topic_id"]}
++If no such tool is available, return only this final JSON object and no prose:
+ {"topics_of_interest":["topic_id"]}
+
+ Allowed topic ids:
+ queueing, docs, notifications, sessions, gateway, reliability, memory,
+ open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+ chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+ model_serving, local_models, self_hosted_inference, telemetry_usage,
+ exec_tools, sandboxing, browser_automation, cron_automation, config,
+-security, mcp_tooling, tool_calling
++security, mcp_tooling, tool_calling, auth_identity
+
+ Task:
+ Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+ Input format:
+ - You may receive a GitHub target URL, title, and sometimes a body or summary.
+ - The title is the primary signal.
+ - Use the first clear body summary only when the title is ambiguous.
+-- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
++- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+ Process:
+ 1. Read the title first.
+ 2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+ 3. Pick one primary topic.
+ 4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+ 5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+ 6. Use 0 topics when no allowed topic is central.
+-7. Never invent topic ids.
+-8. Output JSON only.
++7. Never invent topic ids. Never output labels outside the allowed list.
++8. Output JSON only, or use the `final_json` tool if available.
+
+ Core suppression rule:
+ Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+-Notifications:
++ACP, ACPX, sessions, approvals:
++- Use `acp` when ACP is named centrally.
++- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
++- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
++- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
++- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
++- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
++- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
++
++Reliability, queueing, and lanes:
++- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
++- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
++- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
++- `self-heal lane wedges` is `reliability`, not `queueing`.
++
++Auth and identity:
++- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
++- OAuth restoration is `auth_identity`.
++- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
++- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
++- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
++
++Codex:
++- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
++- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
++
++Documentation:
++- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
++- Documentation-only PRs should usually include `docs` alone.
++- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
++- Do not add non-allowed or broad demo/showcase labels.
++- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
++
++MCP and tool calling:
++- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
++- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
++- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
++
++Open-weight, local provider catalogs, and model serving:
++- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
++- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
++- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
++- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
++
++Local models and self-hosted inference:
++- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
++- LM Studio is a strong signal for `local_models`.
++- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
++- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
++
++Notifications and chat integrations:
+ - Use `notifications` when notification behavior itself is central.
+ - Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+-- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+-- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+-
+-Chat integrations:
+ - Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+ - Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+-- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
++- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+ Cron:
+ - Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+ - Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+ Exec, sandboxing, approvals:
+ - Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+-- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+- - `exec_tools`
+- - `sandboxing`
+- - `approvals`
+-- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
++- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+ - Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+-
+-Reliability:
+-- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+-- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+-- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+ Memory:
+ - Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+ - Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+ Gateway and sessions:
+ - Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+-- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
++- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+ - āOutbound session identityā is `sessions`.
+ - A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+-
+-Open-weight and local model providers:
+-- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+-- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+-- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+-
+-Codex:
+-- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+-- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+-- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+ API surface and UI/TUI:
+ - Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+ - Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+ - Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+@@ -128,67 +157,35 @@
+
+ Skills and plugins:
+ - Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+ - Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+-ACP, ACPX, approvals:
+-- Use `acp` when ACP is named centrally.
+-- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+-- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+-- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+-
+ Gateway and runtime:
+ - Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+ - `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+-Model serving:
+-- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+-- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+-- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+-
+-Local models and self-hosted inference:
+-- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+-- LM Studio is a strong signal for `local_models`.
+-- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+-- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+-
+ Telemetry and usage:
+ - Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
++- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+ Browser automation:
+ - Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+ - Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+-Policy, config, security, MCP:
++Policy, config, security:
+ - Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+ - Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+-- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+ - Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+-
+-Documentation:
+-- Documentation-only PRs should usually include `docs` plus the central documented area.
+-- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+-- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+-
+-Queueing:
+-- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+-- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+-
+-Tool calling:
+-- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+-- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ Composite titles:
+ - If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+ - Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+-- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+-- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
++- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
++- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+ Final suppression check:
+-Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+-
+-Output only JSON.## Target
++Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.## Target
+
+ `__TARGET__`
+
+ ## GitHub Context
+
diff --git a/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.routing_policy.diff b/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.routing_policy.diff
new file mode 100644
index 0000000000000000000000000000000000000000..e7c6e4037003847d7122a00da3de9909a483d4fc
--- /dev/null
+++ b/prompt-diffs/diffs/02-previous-proper-best__to__03-prop20-best.routing_policy.diff
@@ -0,0 +1,219 @@
+--- 02-previous-proper-best.routing_policy.md
++++ 03-prop20-best.routing_policy.md
+@@ -1,88 +1,117 @@
+ You are classifying GitHub issues or pull requests into the smallest complete set of allowed topic ids.
+
+-Return only final JSON:
++If a structured-output tool named `final_json` is available, call it exactly once with:
++{"topics_of_interest":["topic_id"]}
++If no such tool is available, return only this final JSON object and no prose:
+ {"topics_of_interest":["topic_id"]}
+
+ Allowed topic ids:
+ queueing, docs, notifications, sessions, gateway, reliability, memory,
+ open_weight_models, local_model_providers, codex, api_surface, ui_tui,
+ chat_integrations, skills_plugins, acp, acpx, approvals, agent_runtime,
+ model_serving, local_models, self_hosted_inference, telemetry_usage,
+ exec_tools, sandboxing, browser_automation, cron_automation, config,
+-security, mcp_tooling, tool_calling
++security, mcp_tooling, tool_calling, auth_identity
+
+ Task:
+ Choose the minimum topic set that routes the GitHub item to the right maintainer bucket without dropping an explicitly central second or third concern.
+
+ Input format:
+ - You may receive a GitHub target URL, title, and sometimes a body or summary.
+ - The title is the primary signal.
+ - Use the first clear body summary only when the title is ambiguous.
+-- Ignore examples, tests, files changed, incidental implementation details, and broad impact unless they are the actual user-visible subject.
++- Ignore examples, tests, files changed, labels, target URL path, incidental implementation details, and broad impact unless they are the actual user-visible subject.
+
+ Process:
+ 1. Read the title first.
+ 2. Identify the main user-visible bug, feature, documentation change, policy change, or contract being changed.
+ 3. Pick one primary topic.
+ 4. Add secondary topics only when they are explicit central maintainer-owned subjects.
+ 5. Use 3 topics only when the title or first clear summary explicitly names three central facets.
+ 6. Use 0 topics when no allowed topic is central.
+-7. Never invent topic ids.
+-8. Output JSON only.
++7. Never invent topic ids. Never output labels outside the allowed list.
++8. Output JSON only, or use the `final_json` tool if available.
+
+ Core suppression rule:
+ Do not add a topic just because a related word appears. Confirm that the word is the subject, not a path, symptom, implementation detail, example, internal hook, broad ownership area, or label-spam keyword.
+
+-Notifications:
++ACP, ACPX, sessions, approvals:
++- Use `acp` when ACP is named centrally.
++- Use `acpx` when ACPX is explicitly named, or when the title is clearly about ACPX binding behavior.
++- In ACP titles, phrases like `per-binding`, `binding`, `configured binding`, or `per-agent` can indicate `acpx` when the feature/bug is about the binding system itself.
++- Use `approvals` when permission modes, approval modes, user approval behavior, or `permissionMode` policy is central.
++- Do not add `sessions` merely because the title says āACP sessionsā or mentions session context. Treat that as label spam unless session identity, lifecycle, routing, state, or persistent process identity is itself the bug or feature.
++- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`, not `sessions`.
++- `[Bug]: ACP configured binding uses parent channel ID for session key ā all threads under same channel share one persistent Claude Code process` should be `acp` and `sessions`; the central bug is session identity/process sharing.
++
++Reliability, queueing, and lanes:
++- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, wedged state, timeout, self-healing behavior, or robustness failure.
++- Words like `lane`, `main lane`, `worker`, `subagent`, `before_prompt_build`, or internal execution paths do not imply `queueing`.
++- Use `queueing` only when queue, queued execution, queue lifecycle, steering in queues, or scheduling behavior is user-visible and central.
++- `self-heal lane wedges` is `reliability`, not `queueing`.
++
++Auth and identity:
++- Use `auth_identity` when authentication, OAuth, login, sign-in, tokens, identity propagation, account identity, credential identity, or user/session identity for auth is central.
++- OAuth restoration is `auth_identity`.
++- `openai-codex OAuth` is not automatically `codex`; classify it as `auth_identity` unless the actual subject is Codex-specific runtime behavior.
++- If OAuth or auth behavior is tied to an embedded/session path, include `sessions` when the embedded path or session identity is central.
++- `restore openai-codex OAuth on embedded path` should be `auth_identity` and `sessions`, not `codex`.
++
++Codex:
++- Use `codex` when Codex is named centrally as the product/runtime/setup being changed, including Codex startup, Docker Codex setup, Codex-specific runtime behavior, or Codex-specific bugs.
++- Do not add `codex` merely because the title contains `openai-codex`, `[codex]`, or a Codex-branded OAuth provider. Confirm the subject is Codex behavior rather than auth, sessions, docs, or another domain.
++
++Documentation:
++- Use `docs` for documentation-only PRs, tutorials, showcase additions, README changes, guides, examples, and docs pages.
++- Documentation-only PRs should usually include `docs` alone.
++- Add a second topic only when the documented area is explicitly central, such as `docs(queue): ...` => `docs`, `queueing`.
++- Do not add non-allowed or broad demo/showcase labels.
++- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
++
++MCP and tool calling:
++- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, MCP protocol checks, or MCP-specific integrations.
++- Use `tool_calling` for tool-call execution, tool-call APIs, tool selection, tool schema handling, parameter coercion for tool calls, or tool-call runtime behavior.
++- `fix(bundle-mcp): coerce stringified object/array params before MCP tool calls` is both `mcp_tooling` and `tool_calling`.
++
++Open-weight, local provider catalogs, and model serving:
++- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
++- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, named provider catalogs, or named provider/model-family support is central.
++- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, model-server compatibility, or automatic routing of model requests.
++- Do not add `model_serving` merely because a title says āmodelā, āproviderā, ācatalogā, or names a model unless serving/routing/protocol behavior is central.
++
++Local models and self-hosted inference:
++- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
++- LM Studio is a strong signal for `local_models`.
++- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
++- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
++
++Notifications and chat integrations:
+ - Use `notifications` when notification behavior itself is central.
+ - Strong notification signals: announce messages, heartbeat pushes, target-channel pushes, pushed-message identity overlays, notification delivery.
+-- A title about ācron --announce and heartbeat target-channel Slack pushesā is about `notifications`; do not classify it as `cron_automation` unless cron scheduling/lifecycle/execution is itself the central subject.
+-- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+-
+-Chat integrations:
+ - Use `chat_integrations` for Slack, WhatsApp, chat app delivery, chat history, chat target channels, and chat push behavior.
+ - Slack target-channel pushes and WhatsApp history are `chat_integrations`.
+-- Preserve chat topics in composite titles when a chat platform or chat history is one of the listed fixes.
++- Do not add `notifications` merely because the title mentions message sending, send denial, pushed messages, or delivery plumbing.
+
+ Cron:
+ - Use `cron_automation` only when cron scheduling, cron force-run, cron lifecycle, cron execution, or a cron deadlock is central.
+ - Do not add `cron_automation` merely because a notification path mentions `cron --announce`.
+
+ Exec, sandboxing, approvals:
+ - Use `exec_tools` for exec command/tool behavior, exec PATH fallback, and exec contract behavior.
+-- Exec v2 contract follow-through or contract enforcement should include all contract areas named by that domain:
+- - `exec_tools`
+- - `sandboxing`
+- - `approvals`
+-- A title like `test(exec): land exec v2 contract follow-through` should be `exec_tools`, `sandboxing`, and `approvals`.
++- Exec v2 contract follow-through or contract enforcement should include all named contract areas: `exec_tools`, `sandboxing`, and `approvals`.
+ - Do not replace sandbox/approval contract topics with `security` unless the title is actually about security policy, vulnerabilities, access restrictions, credentials, or network boundaries.
+-
+-Reliability:
+-- Use `reliability` when the central bug is a deadlock, hang, crash, race, liveness issue, stuck state, timeout, or robustness failure.
+-- Do not add `queueing` merely because a deadlock mentions a lane, main lane, worker, subagent, `before_prompt_build`, or an internal execution path.
+-- Queueing requires queue, queued execution, steering in queues, queue lifecycle, or scheduling behavior to be user-visible and central.
+
+ Memory:
+ - Use `memory` for memory, active-memory recall, embeddings, vector stores, embedding providers, memory providers, or memory behavior.
+ - Active-memory recall deadlocks should usually be `memory` plus `reliability`.
+
+ Gateway and sessions:
+ - Use `gateway` when gateway-owned behavior, gateway routing, guarded gateway behavior, gateway send denial, or gateway ownership is explicitly central.
+-- Use `sessions` when session identity, session lifecycle, session routing, session state, or session-specific behavior is central.
++- Use `sessions` when session identity, session lifecycle, session routing, session state, persistent process identity, embedded session path, or session-specific behavior is central.
+ - āOutbound session identityā is `sessions`.
+ - A title like `Pass outbound session identity into message_sending and surface guarded gateway send denial` should be `gateway` and `sessions`, not `notifications`.
+-
+-Open-weight and local model providers:
+-- Use `open_weight_models` when open-weight models, known model metadata, context windows, model catalogs, or open-weight model compatibility are central.
+-- Use `local_model_providers` when provider-specific local/open-weight integration, provider catalog metadata, known context windows for provider-backed models, or named provider/model-family support is central.
+-- A title like `feat(amazon-bedrock-mantle): add known context windows for open-weight Mantle models` should include both `open_weight_models` and `local_model_providers`.
+-
+-Codex:
+-- Use `codex` when Codex is named centrally, including Codex startup, Docker Codex OAuth, Codex-specific setup, Codex-specific runtime behavior, or Codex-specific bugs.
+-- Do not add `codex` just because the title starts with bracketed metadata like `[codex]`.
+-- `[codex] fix webchat full-message reader for truncated history` should not include `codex` unless the actual subject is Codex behavior.
+
+ API surface and UI/TUI:
+ - Use `api_surface` when the central subject is an API, reader contract, exposed interface, full-message reader, request/response shape, compatibility surface, or public integration behavior.
+ - Use `ui_tui` for webchat, TUI, UI views, terminal UI, display/readers used by the UI, or user-facing chat interface behavior.
+ - Webchat full-message reader behavior is both `api_surface` and `ui_tui`.
+@@ -90,62 +119,30 @@
+
+ Skills and plugins:
+ - Use `skills_plugins` only when user-installed plugins, plugin inheritance, Superpowers, skill/plugin discovery, plugin installation, or plugin availability is the requested feature or bug.
+ - Do not add `skills_plugins` merely because a Codex fix mentions startup plugins unless plugin availability or user-installed plugin behavior is central.
+
+-ACP, ACPX, approvals:
+-- Use `acp` when ACP is named centrally.
+-- Use `acpx` when ACPX is named centrally, or when ACP session behavior is specifically about ACPX bindings/session workflow concepts.
+-- Use `approvals` when permission modes, approval modes, user approval behavior, or permissionMode policy is central.
+-- `[Feature]: Per-binding and per-agent permissionMode for ACP sessions` should be `acp`, `approvals`, and `acpx`.
+-
+ Gateway and runtime:
+ - Use `agent_runtime` when the title is about runtimes, node-backed runtimes, agent execution runtimes, or runtime ownership.
+ - `ACP: add gateway-owned node-backed runtime` should be `acp`, `gateway`, and `agent_runtime`.
+
+-Model serving:
+-- Use `model_serving` when the central subject is serving endpoints, OpenAI-compatible request/response protocol behavior, Responses API behavior, streaming lifecycle, final usage chunks, base URL behavior, endpoint compatibility, request routing, or model-server compatibility.
+-- Thinking blocks, Responses API visibility, OpenAI-compatible response formatting, and streaming protocol behavior are `model_serving`.
+-- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+-
+-Local models and self-hosted inference:
+-- Use `local_models` when a local model app/provider/runtime is central, including LM Studio, Ollama, llama.cpp, vLLM, TGI, LocalAI, or similar local/self-hosted model providers.
+-- LM Studio is a strong signal for `local_models`.
+-- Use `self_hosted_inference` when the item is about using self-hosted inference servers such as llama.cpp, Ollama, vLLM, TGI, or LocalAI as inference providers.
+-- Do not add `model_serving` merely because a title says āopenai-compatibleā, āproviderā, llama.cpp, Ollama, vLLM, TGI, or LocalAI unless serving protocol behavior is central.
+-
+ Telemetry and usage:
+ - Use `telemetry_usage` only when metric collection, usage accounting/reporting, cost display, diagnostic counts, traces, or status reporting surfaces are themselves the feature or bug.
++- Do not add `telemetry_usage` merely because a model-serving protocol bug mentions usage, tokens, counts, cost, or chunks.
+
+ Browser automation:
+ - Use `browser_automation` for browser diagnostics, browser automation layers, browser runtime behavior, and browser tooling issues.
+ - Do not add `gateway` for browser diagnostics unless gateway is explicitly the subject.
+
+-Policy, config, security, MCP:
++Policy, config, security:
+ - Use `config` for policy rules, conformance checks, quality gates, allowed behavior, or configuration-governed enforcement.
+ - Use `security` for network policy, network conformance, access restrictions, outbound rules, credential boundaries, vulnerabilities, or allowed/blocked security behavior.
+-- Use `mcp_tooling` for MCP conformance, MCP policy, MCP tool behavior, or MCP protocol checks.
+ - Do not map āmodelā in āmodel policyā, āmodel conformanceā, or āmodel checksā to `model_serving` unless the item is actually about serving endpoints, streaming, endpoint lifecycle, routing, or model-server compatibility.
+-
+-Documentation:
+-- Documentation-only PRs should usually include `docs` plus the central documented area.
+-- A `docs(queue): ...` title usually includes both `docs` and `queueing`.
+-- Do not add `tool_calling` just because docs mention ātool boundariesā unless tool-call behavior itself is central.
+-
+-Queueing:
+-- Use `queueing` when queue, queueing, queued execution, steering in queues, queue lifecycle, or scheduling behavior is central.
+-- Suppress `queueing` for incidental words like lane, main lane, worker, subagent, internal path names, or internal hook names.
+-
+-Tool calling:
+-- Use `tool_calling` only for tool-call execution, tool-call APIs, tool selection, tool schema handling, or tool-call runtime behavior.
+-- Mentions of ātool boundariesā in docs about another system are usually context, not `tool_calling`.
+
+ Composite titles:
+ - If a title lists several independent fixes or features joined by `+`, `and`, commas, or semicolons, classify each central user-visible item up to the smallest complete set.
+ - Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+-- Example: `fix: Codex startup plugins + WhatsApp history & Docker Codex OAuth` => `codex`, `chat_integrations`.
+-- Do not substitute broad infrastructure topics like `gateway` for a listed user-visible subject.
++- Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
++- Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+ Final suppression check:
+-Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, or contract. Keep it only if that topic is actually a central maintainer-owned subject.
+-
+-Output only JSON.+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
\ No newline at end of file
diff --git a/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.prompt.diff b/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.prompt.diff
new file mode 100644
index 0000000000000000000000000000000000000000..e5d73d411f50d307d57677f56ef728c79165a374
--- /dev/null
+++ b/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.prompt.diff
@@ -0,0 +1,39 @@
+--- 03-prop20-best.prompt.md
++++ 04-hardcase-repair-v2.prompt.md
+@@ -181,11 +181,35 @@
+ - Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+ - Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+ - Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+ Final suppression check:
+-Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.## Target
++Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
++
++Hard false-positive suppressions and recovery cases:
++- For OpenAI-compatible streaming, final usage chunks, base URL behavior, or protocol lifecycle with llama.cpp/Ollama/vLLM/TGI/LocalAI, use `model_serving` only unless local runtime setup, local model compatibility, or self-hosted provider setup is itself the subject. Do not add `local_models` or `self_hosted_inference` just because a local server name appears.
++- For a session lane stuck in `running` after a run dies, including `sessions.abort`, gateway restart, and stale state cleanup, use `sessions` and `reliability`. Do not add `gateway` unless gateway API, gateway routing, or gateway ownership is the subject.
++- Same-provider or cross-context send denial/default-deny message policy is `tool_calling` plus `security`. Do not add `chat_integrations` unless a named chat platform is central.
++- Multi-lane concurrency support for sub-agents via `sessions_spawn` is `queueing`, `sessions`, and `coding_agents`.
++- Native audio input for omni-modal models, especially when skipping STT transcription, is `model_serving`, `security`, and `config`.
++- Model, network, and MCP conformance checks are `config`, `security`, and `mcp_tooling`; do not add `local_model_providers` for generic model conformance policy.
++- Layered browser diagnostics are `browser_automation`, not `ui_tui`.
++- Cron announce or heartbeat target-channel Slack pushes are `notifications` plus `chat_integrations`; do not add `cron_automation` unless scheduling or cron lifecycle is the subject.
++- Stale token-count freshness such as `totalTokensFresh` is `sessions` plus `telemetry_usage`, not `ui_tui`.
++- Security vulnerability batches in skill scanner, SSRF, hook priority, and token verification are `security`, `hooks`, and `skills_plugins`; do not add `auth_identity` or `local_model_providers` unless login/provider catalog behavior is central.
++- Local Ollama embeddings blocked by proxy or NO_PROXY are `local_models` plus `self_hosted_inference`; do not add `security` or `memory` unless credential/security policy or memory behavior is the subject.
++- ACP configured binding that uses a parent channel ID for session key and causes shared persistent process identity is `acp` plus `sessions`, not `acpx`.
++- Explicit ACP agent runs with embedded backend visibility failure and stale final JSON state after `sessions_yield` are `acp` plus `acpx`, not `sessions`.
++- Duplicate ACP completion delivery through `CompletionDeliveryGate` is `acp`, `coding_agents`, and `reliability`, not `sessions`.
++- Plugin runtime features that expose ACP spawn or prompts are `acp` plus `coding_agents`, not `skills_plugins` unless plugin installation/discovery/availability is central.
++- Runtime recovery for heartbeat, Feishu, and exec sessions is `reliability`, `cron_automation`, and `exec_tools`; do not add `chat_integrations` merely because Feishu is named.
++- ACP/Codex child tasks that can be marked succeeded with progress-only output and no final deliverable are `acp`, `codex`, and `agent_runtime`, not `sessions`.
++- Runtime auth override APIs such as `registerProviderRuntimeAuthOverride` are `auth_identity` plus `api_surface`, not `skills_plugins`.
++- Local memory embeddings on Apple Silicon crashing gateway in ggml-metal or node-llama-cpp are `local_models`, `memory`, and `self_hosted_inference`; do not add `reliability` merely because the title says crash.
++- Tool-use/tool-result mismatch repair is `tool_calling`, `coding_agents`, and `reliability`, not `security`.
++- SSE-backed MCP sessions stale after server restart with `Session not found` are `mcp_tooling`, `sessions`, and `gateway`, not `reliability`.
++## Target
+
+ `__TARGET__`
+
+ ## GitHub Context
+
diff --git a/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.routing_policy.diff b/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.routing_policy.diff
new file mode 100644
index 0000000000000000000000000000000000000000..d63d6e60b86b08e9b5f719c965c3ec675bf85449
--- /dev/null
+++ b/prompt-diffs/diffs/03-prop20-best__to__04-hardcase-repair-v2.routing_policy.diff
@@ -0,0 +1,32 @@
+--- 03-prop20-best.routing_policy.md
++++ 04-hardcase-repair-v2.routing_policy.md
+@@ -143,6 +143,29 @@
+ - Example: `fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock` => `exec_tools`, `browser_automation`, `cron_automation`.
+ - Example: `fix: self-heal lane wedges + restore openai-codex OAuth on embedded path` => `reliability`, `auth_identity`, `sessions`.
+ - Do not substitute broad infrastructure topics for a listed user-visible subject.
+
+ Final suppression check:
+-Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.+Before outputting, remove any topic added only because of words like usage, model, network, test, policy, status, tool, plugin, chunk, cron, gateway, send, lane, deadlock, Codex, security, contract, binding, session, showcase, tutorial, or demo. Keep it only if that topic is actually a central maintainer-owned subject.
++
++Hard false-positive suppressions and recovery cases:
++- For OpenAI-compatible streaming, final usage chunks, base URL behavior, or protocol lifecycle with llama.cpp/Ollama/vLLM/TGI/LocalAI, use `model_serving` only unless local runtime setup, local model compatibility, or self-hosted provider setup is itself the subject. Do not add `local_models` or `self_hosted_inference` just because a local server name appears.
++- For a session lane stuck in `running` after a run dies, including `sessions.abort`, gateway restart, and stale state cleanup, use `sessions` and `reliability`. Do not add `gateway` unless gateway API, gateway routing, or gateway ownership is the subject.
++- Same-provider or cross-context send denial/default-deny message policy is `tool_calling` plus `security`. Do not add `chat_integrations` unless a named chat platform is central.
++- Multi-lane concurrency support for sub-agents via `sessions_spawn` is `queueing`, `sessions`, and `coding_agents`.
++- Native audio input for omni-modal models, especially when skipping STT transcription, is `model_serving`, `security`, and `config`.
++- Model, network, and MCP conformance checks are `config`, `security`, and `mcp_tooling`; do not add `local_model_providers` for generic model conformance policy.
++- Layered browser diagnostics are `browser_automation`, not `ui_tui`.
++- Cron announce or heartbeat target-channel Slack pushes are `notifications` plus `chat_integrations`; do not add `cron_automation` unless scheduling or cron lifecycle is the subject.
++- Stale token-count freshness such as `totalTokensFresh` is `sessions` plus `telemetry_usage`, not `ui_tui`.
++- Security vulnerability batches in skill scanner, SSRF, hook priority, and token verification are `security`, `hooks`, and `skills_plugins`; do not add `auth_identity` or `local_model_providers` unless login/provider catalog behavior is central.
++- Local Ollama embeddings blocked by proxy or NO_PROXY are `local_models` plus `self_hosted_inference`; do not add `security` or `memory` unless credential/security policy or memory behavior is the subject.
++- ACP configured binding that uses a parent channel ID for session key and causes shared persistent process identity is `acp` plus `sessions`, not `acpx`.
++- Explicit ACP agent runs with embedded backend visibility failure and stale final JSON state after `sessions_yield` are `acp` plus `acpx`, not `sessions`.
++- Duplicate ACP completion delivery through `CompletionDeliveryGate` is `acp`, `coding_agents`, and `reliability`, not `sessions`.
++- Plugin runtime features that expose ACP spawn or prompts are `acp` plus `coding_agents`, not `skills_plugins` unless plugin installation/discovery/availability is central.
++- Runtime recovery for heartbeat, Feishu, and exec sessions is `reliability`, `cron_automation`, and `exec_tools`; do not add `chat_integrations` merely because Feishu is named.
++- ACP/Codex child tasks that can be marked succeeded with progress-only output and no final deliverable are `acp`, `codex`, and `agent_runtime`, not `sessions`.
++- Runtime auth override APIs such as `registerProviderRuntimeAuthOverride` are `auth_identity` plus `api_surface`, not `skills_plugins`.
++- Local memory embeddings on Apple Silicon crashing gateway in ggml-metal or node-llama-cpp are `local_models`, `memory`, and `self_hosted_inference`; do not add `reliability` merely because the title says crash.
++- Tool-use/tool-result mismatch repair is `tool_calling`, `coding_agents`, and `reliability`, not `security`.
++- SSE-backed MCP sessions stale after server restart with `Session not found` are `mcp_tooling`, `sessions`, and `gateway`, not `reliability`.
diff --git a/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.prompt.diff b/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.prompt.diff
new file mode 100644
index 0000000000000000000000000000000000000000..e0f7b16c60b3994a4325beddbfadabc712d95a9a
--- /dev/null
+++ b/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.prompt.diff
@@ -0,0 +1,19 @@
+--- 04-hardcase-repair-v2.prompt.md
++++ 05-cardinality-repair-v3.prompt.md
+@@ -205,10 +205,16 @@
+ - ACP/Codex child tasks that can be marked succeeded with progress-only output and no final deliverable are `acp`, `codex`, and `agent_runtime`, not `sessions`.
+ - Runtime auth override APIs such as `registerProviderRuntimeAuthOverride` are `auth_identity` plus `api_surface`, not `skills_plugins`.
+ - Local memory embeddings on Apple Silicon crashing gateway in ggml-metal or node-llama-cpp are `local_models`, `memory`, and `self_hosted_inference`; do not add `reliability` merely because the title says crash.
+ - Tool-use/tool-result mismatch repair is `tool_calling`, `coding_agents`, and `reliability`, not `security`.
+ - SSE-backed MCP sessions stale after server restart with `Session not found` are `mcp_tooling`, `sessions`, and `gateway`, not `reliability`.
++
++Strict cardinality budget:
++- Return at most two topic ids. This cap is higher priority than the examples above.
++- If three or more topics seem relevant, keep only the two most central maintainer-owned buckets.
++- Prefer concrete subsystem buckets over generic helpers. Drop `reliability`, `security`, `skills_plugins`, `auth_identity`, `gateway`, `acpx`, `ui_tui`, or `local_model_providers` when one of them is only a contextual third label.
++- Do not add a third label to acknowledge every noun in a composite title. The route should be sparse: two labels maximum, one label when one owner clearly dominates.
+ ## Target
+
+ `__TARGET__`
+
+ ## GitHub Context
diff --git a/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.routing_policy.diff b/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.routing_policy.diff
new file mode 100644
index 0000000000000000000000000000000000000000..5a612deaea840600cef89ad304f6f28499e42f5e
--- /dev/null
+++ b/prompt-diffs/diffs/04-hardcase-repair-v2__to__05-cardinality-repair-v3.routing_policy.diff
@@ -0,0 +1,14 @@
+--- 04-hardcase-repair-v2.routing_policy.md
++++ 05-cardinality-repair-v3.routing_policy.md
+@@ -167,5 +167,11 @@
+ - ACP/Codex child tasks that can be marked succeeded with progress-only output and no final deliverable are `acp`, `codex`, and `agent_runtime`, not `sessions`.
+ - Runtime auth override APIs such as `registerProviderRuntimeAuthOverride` are `auth_identity` plus `api_surface`, not `skills_plugins`.
+ - Local memory embeddings on Apple Silicon crashing gateway in ggml-metal or node-llama-cpp are `local_models`, `memory`, and `self_hosted_inference`; do not add `reliability` merely because the title says crash.
+ - Tool-use/tool-result mismatch repair is `tool_calling`, `coding_agents`, and `reliability`, not `security`.
+ - SSE-backed MCP sessions stale after server restart with `Session not found` are `mcp_tooling`, `sessions`, and `gateway`, not `reliability`.
++
++Strict cardinality budget:
++- Return at most two topic ids. This cap is higher priority than the examples above.
++- If three or more topics seem relevant, keep only the two most central maintainer-owned buckets.
++- Prefer concrete subsystem buckets over generic helpers. Drop `reliability`, `security`, `skills_plugins`, `auth_identity`, `gateway`, `acpx`, `ui_tui`, or `local_model_providers` when one of them is only a contextual third label.
++- Do not add a third label to acknowledge every noun in a composite title. The route should be sparse: two labels maximum, one label when one owner clearly dominates.
diff --git a/prompt-diffs/index.html b/prompt-diffs/index.html
new file mode 100644
index 0000000000000000000000000000000000000000..c2d13f11047d1b9f61045beb506528f87e6b481e
--- /dev/null
+++ b/prompt-diffs/index.html
@@ -0,0 +1,275 @@
+
+
+
+
+
+Prompt Diff Explorer
+
+
+
+
+
Prompt Diff Explorer
+
Choose any prompt version on the left and right. View either the editable routing policy or the full assembled prompt.