Baladithya Balamurugan
Wave 1: fix 8 failing tests + unblock Docker E2E + dep/doc debt
c11cf49
|
Raw
History Blame Contribute Delete
4.42 kB

Scaffold — socratic-mcts-swe-worldmodel-8f6dea

User Prompt (VERBATIM — gospel)

The user is working in the composer-replication-framework repo. Across a transcript they developed an idea: take SWE-agent traces and do replay-simulation across all other models (Monte Carlo tree of work, every turn parallelized across multiple heterogeneous models / counterfactual "what if model B took over at step 5"), combine with Cursor Composer 2.5's "targeted RL with textual feedback" + dataset-building methods, to instill world-model latent "what-if" deliberation (simulate action A vs B before acting; predict next repo state; self-reflect). Framed as a genetic algorithm (population/fitness/selection/crossover/mutation). Central open question: PRUNE bad branches vs TRAIN-ON-ALL — which better instills introspection/counterfactual-foresight. Pipeline-shape question: two sections (dataset-building MCTS + RL) or one cohesive SFT/RL, or both.

Final instruction (verbatim): "use hyperreserach and workflows to dive into everything that was talked about and all the research that was documented in this project and see if we can do a fresh run and theorization and analysis of what we are trying to do and how we could do it on sagemaker and/or eks (eks primarily)"

Run config

  • vault_tag: socratic-mcts-swe-worldmodel-8f6dea
  • query_file_path: research/query-socratic-mcts-swe-worldmodel-8f6dea.md
  • modality: synthesize (defended thesis: how to build it + the prune-vs-all question) with strong compare + forecast + design elements
  • wrapper requirements: none (no prompt.txt, no wrapper_contract.json). User-prompt run. Final report at research/notes/final_report_socratic-mcts-swe-worldmodel-8f6dea.md.

Modality classification rationale

Primary = SYNTHESIZE: the deliverable is a defended argument for HOW to build this system + a position on the prune-vs-train-on-all question, with evidence chains from both the repo and the literature. Strong secondary COMPARE (paradigm comparisons: Socratic-RL vs Socratic-SWE vs Composer 2.5 vs the proposed multi-model MCTS; prune vs all) and a DESIGN/FORECAST tail (concrete EKS-primary / SageMaker AWS architecture). Drafting style: defended thesis with evidence chains + a committed architecture recommendation + an explicit experimental design for the open question.

Tier rationale

FULL + argumentative (confirmed step 1). The query is multi-part and dialectical: the prune-vs-train-on-all question is a genuine open research question that demands an argued, defended position (≥1 dialectical locus). It requires synthesis across (a) an unusually rich LOCAL corpus (research/01-12, ADR-001..014, the composer_replication package) and (b) external literature (Socratic-RL/SWE, world-model papers, MCTS/counterfactual-RL), AND a committed design deliverable (EKS-primary AWS architecture). citation_style=inline (public-deliverable-style report with a Sources list + repo path:line grounding). 11 required H2 headings + Opinionated Synthesis.

Grounding assets (LOCAL — unusually rich; this is a key differentiator for this run)

  • research/01-12 (Composer 2.5, DiLoCo, Monarch/TorchForge/OpenEnv, verl/TRL, trace-replay-distillation, FeatureDeletionEnv, SDPO hint-generator, SDPO+GRPO integration, blog delta, techreport mining, SDPO alignment indices, altered-model RL critique).
  • docs/adrs/ADR-001..014 (decision backbone; esp. 005 serverless/DiLoCo, 006 RL frameworks, 008 Dr.GRPO+SDPO, 009 hint generator, 010 FeatureDeletionEnv, 013 LMA ladder, 014 PO objective menu).
  • docs/COMPOSER_RECIPE_MAPPING.md, docs/research/* reconnaissance, docs/OVERVIEW.md.
  • composer_replication/ package: loss.py (compose_loss), opsd.py (generalized_jsd_loss), teacher_replay.py (multi-teacher replay = the direct ancestor of the MCTS idea), trainer/composer_trainer.py + data_collator.py, hint_generator.py, datagen/ (FeatureDeletionEnv), diloco/ + diloco/serverless/, ingestion/.
  • PROVENANCE GUARDRAIL: Channel 3 (multi-teacher trace-replay-DPO) is the framework's OWN addition, not Cursor's. Keep this honest in the report.

Wrapper requirements

  • save path: research/notes/final_report_socratic-mcts-swe-worldmodel-8f6dea.md
  • citation format: standard (arXiv IDs + repo path:line where grounding in local code)
  • terminal sections: none mandated; include an EKS architecture section + the prune-vs-all experimental design as load-bearing sections.