Baladithya Balamurugan

Wave 3: close the HIGH review findings (kill-switch wiring, HeldoutSplit, EKS entrypoint bug)

bd0c358 16 days ago

2.65 kB

Architecture Decision Records

#	Title	Status	Date
ADR-001	GPU venue	accepted	—
ADR-002	Trace source	accepted	—
ADR-003	DiLoCo implementation	accepted	—
ADR-004	ReplaySim normalization	accepted	—
ADR-005	Serverless DiLoCo	accepted	—
ADR-006	RL framework strategy: TRL + VeRL + PRIME-RL	accepted (amended-by ADR-008)	2026-05-26
ADR-007	Self-distillation losses landscape	accepted	2026-05-26
ADR-008	Target Dr. GRPO + host live SDPO channel in TRL trainer	accepted	2026-05-29
ADR-009	Layered HintGenerator for SDPO textual feedback	accepted	2026-05-29
ADR-010	FeatureDeletionEnv synthetic-data subsystem over OSS SWE substrates	accepted	2026-05-29
ADR-011	Collator-emitted SDPO alignment indices (close strict-guard regression)	accepted (amends ADR-008)	2026-05-29
ADR-012	Close open cross-family-review findings (KL/hint-routing/provenance/curriculum)	accepted (amends 008/009/010)	2026-05-29
ADR-013	LMA integration — isolated-channel ladder (supersedes tie-in Phase-3 hyperparams)	accepted	2026-05-29
ADR-014	Policy-optimization objective MENU: base RL objective selectable (default Dr.GRPO) over TRL 1.5.0 GRPOConfig (builds-on ADR-006/007/008)	accepted	2026-05-30
ADR-015	Held-out disjoint eval + depth/generation kill-switch (run-level collapse safeguard #2): `HeldOutGuard` + `HeldoutSplit` in `composer_replication.safety`	accepted	2026-06-09

Sorted by number ascending. ADRs are immutable after accepted; supersede or amend rather than edit.

Provenance note (ADR-014). ADR-014 also records the canonical correction that the framework's trace-replay-DPO channel (channel 3) is an additive research channel, NOT part of Cursor's Composer recipe -- Composer's primary sources contain no DPO / preference pairs / reward models / multiple teachers. Genuine replication is channels 1 (Dr.GRPO base)

2 (SDPO). See docs/OVERVIEW.md for the honest three-channel summary.