Role-Drift in Compound AI Systems — Checkpoints

Companion checkpoints for the paper Role Drift in Compound AI Systems (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward.

Companion code repository: [GitHub link forthcoming]

Repo summary

123 training runs spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain.
906 per-epoch checkpoints (typically sft + rl_ep0..9 per run; some runs are partial / in-progress).
Schema version: 1.0. Generated at 2026-04-28T18:32:41.607530+00:00.

Top-level structure

.
├── README.md                      # this file
├── MANIFEST.json                  # machine-readable index of all runs
└── checkpoints/
    ├── README.md                  # archetype-level overview
    └── `rag_3b3b_canonical/`** — 75 runs, 573 checkpoints
    └── `rag_3b7b_asymmetric/`** — 13 runs, 143 checkpoints
    └── `rag_7b7b_indep/`** — 18 runs, 100 checkpoints
    └── `rag_7b7b_shared/`** — 11 runs, 54 checkpoints
    └── `verifier/`** — 6 runs, 36 checkpoints

Each checkpoints/<archetype_dir>/<run_name>/ contains:

run_meta.json — full provenance: hyperparameters, paper section, source path, schema version.
sft/ — adapter checkpoint after the SFT initialization phase.
rl_ep0/, rl_ep1/, …, rl_ep9/ — adapter checkpoint after each RL epoch.

For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate query_gen/ and reader/ adapter directories. For shared-LoRA runs, one adapter/ directory.

How to load a checkpoint

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import snapshot_download

# Pick a run from MANIFEST.json
local = snapshot_download(
    repo_id="Sean13/role-drift-compound-systems",
    repo_type="dataset",
    allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**",
                    "checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"],
)

# For asymmetric/independent: load reader adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct",
                                            torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader")

Provenance

Each run_meta.json records the original local path on the training cluster (/home/xycao/compound_sys/... or /orcd/pool/008/.../legacy_home_checkpoints/...). This allows full traceability back to the codebase commit.

License

CC-BY-4.0. Cite the paper if you use these checkpoints.

Reproduction

To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (experiments/search/search_agent.py, experiments/verifier/verifier_rl.py) and SLURM launchers (jobs/launch_*.sh).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support