Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| """altered_minds — framework-side, generic LMA integration glue (ADR-013). | |
| This package is the *model-agnostic* scaffold that lets the Composer Replication | |
| Framework drive the sister project llm-mental-alterations (LMA): take a | |
| personality-altered SFT checkpoint and apply the framework's 3-channel RL to ask | |
| whether task-driven RL washes out, preserves, or AMPLIFIES the alteration's | |
| cognitive-distortion signature. | |
| Nothing here loads an LMA checkpoint, calls Modal, or spends budget — that is | |
| explicitly user-gated (ADR-013 "out of scope"). This package provides: | |
| - ``MMLUFormatReward`` : structured-answer reward (final letter + format | |
| only; never rationale style). Plus | |
| ``randomize_options`` and a logged option | |
| distribution so an "always C" exploit is | |
| detectable. | |
| - ``dual_kl_logger`` : logs KL(policy||altered_init) AND KL(policy||base) | |
| each step — the washout/amplification instrument. | |
| - ``channel_ladder_configs``: the A0-A4 isolated-channel ladder that REPLACES | |
| the old combined alpha=0.2/beta=0.4 recipe. | |
| See docs/adrs/ADR-013-lma-integration-channel-ladder.md. | |
| """ | |
| from __future__ import annotations | |
| from composer_replication.integrations.altered_minds.kl_logging import ( | |
| dual_kl_logger, | |
| token_mean_kl, | |
| ) | |
| from composer_replication.integrations.altered_minds.ladder import ( | |
| LADDER_KL_BETA, | |
| channel_ladder_configs, | |
| ) | |
| from composer_replication.integrations.altered_minds.reward import ( | |
| MMLUFormatReward, | |
| parse_final_answer, | |
| randomize_options, | |
| ) | |
| __all__ = [ | |
| "MMLUFormatReward", | |
| "parse_final_answer", | |
| "randomize_options", | |
| "dual_kl_logger", | |
| "token_mean_kl", | |
| "channel_ladder_configs", | |
| "LADDER_KL_BETA", | |
| ] | |