--- license: cc-by-4.0 tags: - role-drift - rlhf - rlvr - lora - reasoning - rag - verifier - compound-ai-systems --- # Role-Drift in Compound AI Systems — Checkpoints Companion checkpoints for the paper *Role Drift in Compound AI Systems* (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward. > **Companion code repository:** [GitHub link forthcoming] ## Repo summary - **123 training runs** spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain. - **906 per-epoch checkpoints** (typically `sft` + `rl_ep0..9` per run; some runs are partial / in-progress). - **Schema version**: `1.0`. Generated at `2026-04-28T18:32:41.607530+00:00`. ## Top-level structure ``` . ├── README.md # this file ├── MANIFEST.json # machine-readable index of all runs └── checkpoints/ ├── README.md # archetype-level overview └── `rag_3b3b_canonical/`** — 75 runs, 573 checkpoints └── `rag_3b7b_asymmetric/`** — 13 runs, 143 checkpoints └── `rag_7b7b_indep/`** — 18 runs, 100 checkpoints └── `rag_7b7b_shared/`** — 11 runs, 54 checkpoints └── `verifier/`** — 6 runs, 36 checkpoints ``` Each `checkpoints///` contains: - `run_meta.json` — full provenance: hyperparameters, paper section, source path, schema version. - `sft/` — adapter checkpoint after the SFT initialization phase. - `rl_ep0/`, `rl_ep1/`, …, `rl_ep9/` — adapter checkpoint after each RL epoch. For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate `query_gen/` and `reader/` adapter directories. For shared-LoRA runs, one `adapter/` directory. ## How to load a checkpoint ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer from huggingface_hub import snapshot_download # Pick a run from MANIFEST.json local = snapshot_download( repo_id="Sean13/role-drift-compound-systems", repo_type="dataset", allow_patterns=["checkpoints/rag_7b7b_indep//rl_ep9/**", "checkpoints/rag_7b7b_indep//run_meta.json"], ) # For asymmetric/independent: load reader adapter base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype="bfloat16") model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep//rl_ep9/reader") ``` ## Provenance Each `run_meta.json` records the original local path on the training cluster (`/home/xycao/compound_sys/...` or `/orcd/pool/008/.../legacy_home_checkpoints/...`). This allows full traceability back to the codebase commit. ## License CC-BY-4.0. Cite the paper if you use these checkpoints. ## Reproduction To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (`experiments/search/search_agent.py`, `experiments/verifier/verifier_rl.py`) and SLURM launchers (`jobs/launch_*.sh`).