Sean13
/

role-drift-compound-systems

compound-ai-systems

Model card Files Files and versions

role-drift-compound-systems / README.md

Sean13's picture

upload README.md (1 files, 0 MB)

90c21a7 verified 11 days ago

|

history blame contribute delete

3.23 kB

	---
	license: cc-by-4.0
	tags:
	- role-drift
	- rlhf
	- rlvr
	- lora
	- reasoning
	- rag
	- verifier
	- compound-ai-systems
	---

	# Role-Drift in Compound AI Systems — Checkpoints

	Companion checkpoints for the paper Role Drift in Compound AI Systems (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward.

	> Companion code repository: [GitHub link forthcoming]

	## Repo summary

	- 123 training runs spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain.
	- 906 per-epoch checkpoints (typically `sft` + `rl_ep0..9` per run; some runs are partial / in-progress).
	- Schema version: `1.0`. Generated at `2026-04-28T18:32:41.607530+00:00`.

	## Top-level structure

	```
	.
	├── README.md # this file
	├── MANIFEST.json # machine-readable index of all runs
	└── checkpoints/
	├── README.md # archetype-level overview
	└── `rag_3b3b_canonical/`** — 75 runs, 573 checkpoints
	└── `rag_3b7b_asymmetric/`** — 13 runs, 143 checkpoints
	└── `rag_7b7b_indep/`** — 18 runs, 100 checkpoints
	└── `rag_7b7b_shared/`** — 11 runs, 54 checkpoints
	└── `verifier/`** — 6 runs, 36 checkpoints
	```

	Each `checkpoints/<archetype_dir>/<run_name>/` contains:
	- `run_meta.json` — full provenance: hyperparameters, paper section, source path, schema version.
	- `sft/` — adapter checkpoint after the SFT initialization phase.
	- `rl_ep0/`, `rl_ep1/`, …, `rl_ep9/` — adapter checkpoint after each RL epoch.

	For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate `query_gen/` and `reader/` adapter directories. For shared-LoRA runs, one `adapter/` directory.

	## How to load a checkpoint

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from huggingface_hub import snapshot_download

	# Pick a run from MANIFEST.json
	local = snapshot_download(
	repo_id="Sean13/role-drift-compound-systems",
	repo_type="dataset",
	allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**",
	"checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"],
	)

	# For asymmetric/independent: load reader adapter
	base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct",
	torch_dtype="bfloat16")
	model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader")
	```

	## Provenance

	Each `run_meta.json` records the original local path on the training cluster (`/home/xycao/compound_sys/...` or `/orcd/pool/008/.../legacy_home_checkpoints/...`). This allows full traceability back to the codebase commit.

	## License

	CC-BY-4.0. Cite the paper if you use these checkpoints.

	## Reproduction

	To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (`experiments/search/search_agent.py`, `experiments/verifier/verifier_rl.py`) and SLURM launchers (`jobs/launch_*.sh`).