| --- |
| license: cc-by-4.0 |
| tags: |
| - role-drift |
| - rlhf |
| - rlvr |
| - lora |
| - reasoning |
| - rag |
| - verifier |
| - compound-ai-systems |
| --- |
| |
| # Role-Drift in Compound AI Systems β Checkpoints |
|
|
| Companion checkpoints for the paper *Role Drift in Compound AI Systems* (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward. |
|
|
| > **Companion code repository:** [GitHub link forthcoming] |
|
|
| ## Repo summary |
|
|
| - **123 training runs** spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain. |
| - **906 per-epoch checkpoints** (typically `sft` + `rl_ep0..9` per run; some runs are partial / in-progress). |
| - **Schema version**: `1.0`. Generated at `2026-04-28T18:32:41.607530+00:00`. |
|
|
| ## Top-level structure |
|
|
| ``` |
| . |
| βββ README.md # this file |
| βββ MANIFEST.json # machine-readable index of all runs |
| βββ checkpoints/ |
| βββ README.md # archetype-level overview |
| βββ `rag_3b3b_canonical/`** β 75 runs, 573 checkpoints |
| βββ `rag_3b7b_asymmetric/`** β 13 runs, 143 checkpoints |
| βββ `rag_7b7b_indep/`** β 18 runs, 100 checkpoints |
| βββ `rag_7b7b_shared/`** β 11 runs, 54 checkpoints |
| βββ `verifier/`** β 6 runs, 36 checkpoints |
| ``` |
|
|
| Each `checkpoints/<archetype_dir>/<run_name>/` contains: |
| - `run_meta.json` β full provenance: hyperparameters, paper section, source path, schema version. |
| - `sft/` β adapter checkpoint after the SFT initialization phase. |
| - `rl_ep0/`, `rl_ep1/`, β¦, `rl_ep9/` β adapter checkpoint after each RL epoch. |
|
|
| For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate `query_gen/` and `reader/` adapter directories. For shared-LoRA runs, one `adapter/` directory. |
|
|
| ## How to load a checkpoint |
|
|
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from huggingface_hub import snapshot_download |
| |
| # Pick a run from MANIFEST.json |
| local = snapshot_download( |
| repo_id="Sean13/role-drift-compound-systems", |
| repo_type="dataset", |
| allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**", |
| "checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"], |
| ) |
| |
| # For asymmetric/independent: load reader adapter |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", |
| torch_dtype="bfloat16") |
| model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader") |
| ``` |
|
|
| ## Provenance |
|
|
| Each `run_meta.json` records the original local path on the training cluster (`/home/xycao/compound_sys/...` or `/orcd/pool/008/.../legacy_home_checkpoints/...`). This allows full traceability back to the codebase commit. |
|
|
| ## License |
|
|
| CC-BY-4.0. Cite the paper if you use these checkpoints. |
|
|
| ## Reproduction |
|
|
| To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (`experiments/search/search_agent.py`, `experiments/verifier/verifier_rl.py`) and SLURM launchers (`jobs/launch_*.sh`). |
|
|