Role-Drift in Compound AI Systems β Checkpoints
Companion checkpoints for the paper Role Drift in Compound AI Systems (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward.
Companion code repository: [GitHub link forthcoming]
Repo summary
- 123 training runs spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain.
- 906 per-epoch checkpoints (typically
sft+rl_ep0..9per run; some runs are partial / in-progress). - Schema version:
1.0. Generated at2026-04-28T18:32:41.607530+00:00.
Top-level structure
.
βββ README.md # this file
βββ MANIFEST.json # machine-readable index of all runs
βββ checkpoints/
βββ README.md # archetype-level overview
βββ `rag_3b3b_canonical/`** β 75 runs, 573 checkpoints
βββ `rag_3b7b_asymmetric/`** β 13 runs, 143 checkpoints
βββ `rag_7b7b_indep/`** β 18 runs, 100 checkpoints
βββ `rag_7b7b_shared/`** β 11 runs, 54 checkpoints
βββ `verifier/`** β 6 runs, 36 checkpoints
Each checkpoints/<archetype_dir>/<run_name>/ contains:
run_meta.jsonβ full provenance: hyperparameters, paper section, source path, schema version.sft/β adapter checkpoint after the SFT initialization phase.rl_ep0/,rl_ep1/, β¦,rl_ep9/β adapter checkpoint after each RL epoch.
For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate query_gen/ and reader/ adapter directories. For shared-LoRA runs, one adapter/ directory.
How to load a checkpoint
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import snapshot_download
# Pick a run from MANIFEST.json
local = snapshot_download(
repo_id="Sean13/role-drift-compound-systems",
repo_type="dataset",
allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**",
"checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"],
)
# For asymmetric/independent: load reader adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct",
torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader")
Provenance
Each run_meta.json records the original local path on the training cluster (/home/xycao/compound_sys/... or /orcd/pool/008/.../legacy_home_checkpoints/...). This allows full traceability back to the codebase commit.
License
CC-BY-4.0. Cite the paper if you use these checkpoints.
Reproduction
To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (experiments/search/search_agent.py, experiments/verifier/verifier_rl.py) and SLURM launchers (jobs/launch_*.sh).