Baladithya Balamurugan

Wave 1: fix 8 failing tests + unblock Docker E2E + dep/doc debt

c11cf49 17 days ago

5.4 kB

	{
	"loci": [
	{
	"name": "prune-vs-train-on-all",
	"one_line": "Does training on losing/failed branches (vs pruning to winners-only) better instill counterfactual foresight + introspection — and HOW must negatives be used to help rather than destabilize?",
	"flavor": "dialectical",
	"importance": 10,
	"uncertainty": 9,
	"disagreement": 9,
	"decision_impact": 10,
	"composite_score": 38,
	"source_budget": 15,
	"rationale": "The user's explicitly-named CENTRAL question. Genuine empirical fork: RAFT/positives-only is stable & competitive (2504.11343) and naive negative gradient destabilizes (2505.18830), vs negatives carry unique signal that improves agent tuning (2402.11651, 2503.14391, expert-failures). Resolving it changes the entire dataset-construction design (prune the tree vs keep it as typed signal). Must produce an argued position + concrete experiment, grounded in the repo's ADR-013 A0-A4 ladder."
	},
	{
	"name": "worldmodel-latent-deliberation",
	"one_line": "Can latent 'what-if' deliberation (predict next repo-state before acting) be trained into a SWE agent via an auxiliary next-state-prediction objective, or does it emerge from scale — and how do you measure it?",
	"flavor": "dialectical",
	"importance": 9,
	"uncertainty": 8,
	"disagreement": 7,
	"decision_impact": 9,
	"composite_score": 33,
	"source_budget": 12,
	"rationale": "The user's core GOAL (the 'world-model thinking' aim). Fork: LLMs are implicit world models / emerges from scale (2512.18832, 2411.08794) vs agents fail to USE world models for foresight without explicit training (2601.03905) + MuZero/Chain-of-World train it explicitly (1911.08265, 2603.03195). Decision-relevant: determines whether to add the aux loss + a deliberation token, and how to measure (calibration / foresight accuracy). Must map onto the repo's SDPO channel as the natural carrier."
	},
	{
	"name": "selfevolve-flywheel-vs-collapse",
	"one_line": "Does the closed-loop multi-model MCTS + self-distillation flywheel compound improvement, or collapse into reward-hacking / diversity-loss / human-trace entrenchment — and what design choices prevent collapse?",
	"flavor": "dialectical",
	"importance": 9,
	"uncertainty": 8,
	"disagreement": 8,
	"decision_impact": 9,
	"composite_score": 34,
	"source_budget": 11,
	"rationale": "Determines whether the whole genetic-algorithm flywheel is sound. Strong adversarial convergence (reward-hacking worsens with depth — RSI ICLR2026; collapse from closed-loop self-distillation — self-evolving survey §8.3; replay entrenches human distribution — Self-Play-SWE-RL 2512.18552) vs working flywheels (Socratic-SWE +7.8, DeepSWE, SWE-RL). Resolution = keep a true execution ORACLE + heterogeneous-model population as anti-collapse diversity. High decision impact on safeguards."
	},
	{
	"name": "credit-assignment-tree-as-process-signal",
	"one_line": "Does the multi-model tree's divergence structure give cheap, dense PROCESS-level credit assignment that beats outcome-only RL — without training a separate PRM?",
	"flavor": "technical",
	"importance": 8,
	"uncertainty": 6,
	"disagreement": 7,
	"decision_impact": 8,
	"composite_score": 29,
	"source_budget": 8,
	"rationale": "The mechanism that makes the idea pay off. Process-supervision helps (Let's-Verify 2305.20050, PRM 2211.14275, Cursor's own targeted-feedback motivation) vs outcome-only suffices (DeepSWE, SWE-RL, min-form 2504.15275). The tree manufactures process signal cheaply from divergence + auto-generated textual feedback (wiring into the SDPO hint hook). Counterfactual credit-assignment theory (2011.09464, 2306.16803) is the formal backbone. Technical synthesis, moderate uncertainty."
	},
	{
	"name": "eks-architecture-and-substrate-mapping",
	"one_line": "What is the concrete EKS-primary (+ SageMaker-hybrid) architecture, and what is the minimal delta to map the repo's ServerlessExecutor/ObjectStoreAllReduce/DiLoCo onto it?",
	"flavor": "technical",
	"importance": 10,
	"uncertainty": 4,
	"disagreement": 5,
	"decision_impact": 9,
	"composite_score": 28,
	"source_budget": 10,
	"rationale": "The explicit DELIVERABLE ('how we could do it on sagemaker and/or eks, eks primarily'). Lower uncertainty (AWS-documented patterns: JARK/verl-on-EKS, KubeRay, Karpenter, GPU time-slicing/MIG, gVisor/Kata sandboxes, HyperPod) but very high decision impact — the report must commit to a concrete design. Includes the EKSExecutor delta, the sandbox-fan-out, the outer/inner loop placement, and the EKS-vs-SageMaker hybrid split."
	}
	],
	"skip_loci": [
	{"name": "multimodel-tree-novelty-claim", "reason": "Resolved without depth: the honest position is the COMBINATION is novel, not the primitives (SWE-Search/tree-search use single models; Symphony mixes models for planning; Channel 3 already does flat multi-teacher). Folds into §1 framing, not a depth locus."},
	{"name": "which-RL-engine-trl-vs-verl-vs-prime-rl", "reason": "Already decided in repo (ADR-006: TRL hosts SDPO since it needs full logits; verl/PRIME-RL for scale-out). Engineering choice, reported in §6/§8, not a contested research locus."}
	]
	}