File size: 3,231 Bytes

90c21a7

---
license: cc-by-4.0
tags:
  - role-drift
  - rlhf
  - rlvr
  - lora
  - reasoning
  - rag
  - verifier
  - compound-ai-systems
---

# Role-Drift in Compound AI Systems — Checkpoints

Companion checkpoints for the paper *Role Drift in Compound AI Systems* (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward.

> **Companion code repository:** [GitHub link forthcoming]

## Repo summary

- **123 training runs** spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain.
- **906 per-epoch checkpoints** (typically `sft` + `rl_ep0..9` per run; some runs are partial / in-progress).
- **Schema version**: `1.0`. Generated at `2026-04-28T18:32:41.607530+00:00`.

## Top-level structure

```
.
├── README.md                      # this file
├── MANIFEST.json                  # machine-readable index of all runs
└── checkpoints/
    ├── README.md                  # archetype-level overview
    └── `rag_3b3b_canonical/`** — 75 runs, 573 checkpoints
    └── `rag_3b7b_asymmetric/`** — 13 runs, 143 checkpoints
    └── `rag_7b7b_indep/`** — 18 runs, 100 checkpoints
    └── `rag_7b7b_shared/`** — 11 runs, 54 checkpoints
    └── `verifier/`** — 6 runs, 36 checkpoints
```

Each `checkpoints/<archetype_dir>/<run_name>/` contains:
- `run_meta.json` — full provenance: hyperparameters, paper section, source path, schema version.
- `sft/` — adapter checkpoint after the SFT initialization phase.
- `rl_ep0/`, `rl_ep1/`, …, `rl_ep9/` — adapter checkpoint after each RL epoch.

For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate `query_gen/` and `reader/` adapter directories. For shared-LoRA runs, one `adapter/` directory.

## How to load a checkpoint

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import snapshot_download

# Pick a run from MANIFEST.json
local = snapshot_download(
    repo_id="Sean13/role-drift-compound-systems",
    repo_type="dataset",
    allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**",
                    "checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"],
)

# For asymmetric/independent: load reader adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct",
                                            torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader")
```

## Provenance

Each `run_meta.json` records the original local path on the training cluster (`/home/xycao/compound_sys/...` or `/orcd/pool/008/.../legacy_home_checkpoints/...`). This allows full traceability back to the codebase commit.

## License

CC-BY-4.0. Cite the paper if you use these checkpoints.

## Reproduction

To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (`experiments/search/search_agent.py`, `experiments/verifier/verifier_rl.py`) and SLURM launchers (`jobs/launch_*.sh`).