Baladithya Balamurugan
Wave 3: close the HIGH review findings (kill-switch wiring, HeldoutSplit, EKS entrypoint bug)
bd0c358
|
Raw
History Blame Contribute Delete
2.65 kB

Architecture Decision Records

# Title Status Date
ADR-001 GPU venue accepted
ADR-002 Trace source accepted
ADR-003 DiLoCo implementation accepted
ADR-004 ReplaySim normalization accepted
ADR-005 Serverless DiLoCo accepted
ADR-006 RL framework strategy: TRL + VeRL + PRIME-RL accepted (amended-by ADR-008) 2026-05-26
ADR-007 Self-distillation losses landscape accepted 2026-05-26
ADR-008 Target Dr. GRPO + host live SDPO channel in TRL trainer accepted 2026-05-29
ADR-009 Layered HintGenerator for SDPO textual feedback accepted 2026-05-29
ADR-010 FeatureDeletionEnv synthetic-data subsystem over OSS SWE substrates accepted 2026-05-29
ADR-011 Collator-emitted SDPO alignment indices (close strict-guard regression) accepted (amends ADR-008) 2026-05-29
ADR-012 Close open cross-family-review findings (KL/hint-routing/provenance/curriculum) accepted (amends 008/009/010) 2026-05-29
ADR-013 LMA integration — isolated-channel ladder (supersedes tie-in Phase-3 hyperparams) accepted 2026-05-29
ADR-014 Policy-optimization objective MENU: base RL objective selectable (default Dr.GRPO) over TRL 1.5.0 GRPOConfig (builds-on ADR-006/007/008) accepted 2026-05-30
ADR-015 Held-out disjoint eval + depth/generation kill-switch (run-level collapse safeguard #2): HeldOutGuard + HeldoutSplit in composer_replication.safety accepted 2026-06-09

Sorted by number ascending. ADRs are immutable after accepted; supersede or amend rather than edit.

Provenance note (ADR-014). ADR-014 also records the canonical correction that the framework's trace-replay-DPO channel (channel 3) is an additive research channel, NOT part of Cursor's Composer recipe -- Composer's primary sources contain no DPO / preference pairs / reward models / multiple teachers. Genuine replication is channels 1 (Dr.GRPO base)