# Architecture Decision Records | # | Title | Status | Date | |---|-------|--------|------| | [ADR-001](ADR-001-gpu-venue.md) | GPU venue | accepted | — | | [ADR-002](ADR-002-trace-source.md) | Trace source | accepted | — | | [ADR-003](ADR-003-diloco-impl.md) | DiLoCo implementation | accepted | — | | [ADR-004](ADR-004-replaysim-normalization.md) | ReplaySim normalization | accepted | — | | [ADR-005](ADR-005-serverless-diloco.md) | Serverless DiLoCo | accepted | — | | [ADR-006](ADR-006-rl-frameworks.md) | RL framework strategy: TRL + VeRL + PRIME-RL | accepted (amended-by ADR-008) | 2026-05-26 | | [ADR-007](ADR-007-self-distillation-losses.md) | Self-distillation losses landscape | accepted | 2026-05-26 | | [ADR-008](ADR-008-drgrpo-sdpo-live-channel.md) | Target Dr. GRPO + host live SDPO channel in TRL trainer | accepted | 2026-05-29 | | [ADR-009](ADR-009-layered-hint-generator.md) | Layered HintGenerator for SDPO textual feedback | accepted | 2026-05-29 | | [ADR-010](ADR-010-feature-deletion-datagen.md) | FeatureDeletionEnv synthetic-data subsystem over OSS SWE substrates | accepted | 2026-05-29 | | [ADR-011](ADR-011-sdpo-alignment-indices.md) | Collator-emitted SDPO alignment indices (close strict-guard regression) | accepted (amends ADR-008) | 2026-05-29 | | [ADR-012](ADR-012-close-review-findings.md) | Close open cross-family-review findings (KL/hint-routing/provenance/curriculum) | accepted (amends 008/009/010) | 2026-05-29 | | [ADR-013](ADR-013-lma-integration-channel-ladder.md) | LMA integration — isolated-channel ladder (supersedes tie-in Phase-3 hyperparams) | accepted | 2026-05-29 | | [ADR-014](ADR-014-policy-optimization-objective-menu.md) | Policy-optimization objective MENU: base RL objective selectable (default Dr.GRPO) over TRL 1.5.0 GRPOConfig (builds-on ADR-006/007/008) | accepted | 2026-05-30 | | [ADR-015](ADR-015-holdout-killswitch.md) | Held-out disjoint eval + depth/generation kill-switch (run-level collapse safeguard #2): `HeldOutGuard` + `HeldoutSplit` in `composer_replication.safety` | accepted | 2026-06-09 | Sorted by number ascending. ADRs are immutable after `accepted`; supersede or amend rather than edit. > **Provenance note (ADR-014).** ADR-014 also records the canonical correction that the > framework's **trace-replay-DPO channel (channel 3) is an additive research channel, NOT > part of Cursor's Composer recipe** -- Composer's primary sources contain no DPO / preference > pairs / reward models / multiple teachers. Genuine replication is channels 1 (Dr.GRPO base) > + 2 (SDPO). See [`docs/OVERVIEW.md`](../OVERVIEW.md) for the honest three-channel summary.