feat(wave-b): ADR-013 LMA integration + B4 end-to-end SDPO-fires proof + doc refresh

21647a4 28 days ago

1.88 kB

	"""altered_minds — framework-side, generic LMA integration glue (ADR-013).

	This package is the model-agnostic scaffold that lets the Composer Replication
	Framework drive the sister project llm-mental-alterations (LMA): take a
	personality-altered SFT checkpoint and apply the framework's 3-channel RL to ask
	whether task-driven RL washes out, preserves, or AMPLIFIES the alteration's
	cognitive-distortion signature.

	Nothing here loads an LMA checkpoint, calls Modal, or spends budget — that is
	explicitly user-gated (ADR-013 "out of scope"). This package provides:

	- ``MMLUFormatReward`` : structured-answer reward (final letter + format
	only; never rationale style). Plus
	``randomize_options`` and a logged option
	distribution so an "always C" exploit is
	detectable.
	- ``dual_kl_logger`` : logs KL(policy\|\|altered_init) AND KL(policy\|\|base)
	each step — the washout/amplification instrument.
	- ``channel_ladder_configs``: the A0-A4 isolated-channel ladder that REPLACES
	the old combined alpha=0.2/beta=0.4 recipe.

	See docs/adrs/ADR-013-lma-integration-channel-ladder.md.
	"""
	from __future__ import annotations

	from composer_replication.integrations.altered_minds.kl_logging import (
	dual_kl_logger,
	token_mean_kl,
	)
	from composer_replication.integrations.altered_minds.ladder import (
	LADDER_KL_BETA,
	channel_ladder_configs,
	)
	from composer_replication.integrations.altered_minds.reward import (
	MMLUFormatReward,
	parse_final_answer,
	randomize_options,
	)

	__all__ = [
	"MMLUFormatReward",
	"parse_final_answer",
	"randomize_options",
	"dual_kl_logger",
	"token_mean_kl",
	"channel_ladder_configs",
	"LADDER_KL_BETA",
	]