File size: 3,231 Bytes
90c21a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: cc-by-4.0
tags:
  - role-drift
  - rlhf
  - rlvr
  - lora
  - reasoning
  - rag
  - verifier
  - compound-ai-systems
---

# Role-Drift in Compound AI Systems β€” Checkpoints

Companion checkpoints for the paper *Role Drift in Compound AI Systems* (Sean Cao et al., NeurIPS 2026 submission, in preparation). All checkpoints are LoRA adapters on Qwen2.5-{3B,7B}-Instruct base models, produced by REINFORCE training with binary outcome reward.

> **Companion code repository:** [GitHub link forthcoming]

## Repo summary

- **123 training runs** spanning four RAG configurations (3B+3B, 3B+7B asymmetric, 7B+7B Independent, 7B+7B Shared) and a Verifier domain.
- **906 per-epoch checkpoints** (typically `sft` + `rl_ep0..9` per run; some runs are partial / in-progress).
- **Schema version**: `1.0`. Generated at `2026-04-28T18:32:41.607530+00:00`.

## Top-level structure

```
.
β”œβ”€β”€ README.md                      # this file
β”œβ”€β”€ MANIFEST.json                  # machine-readable index of all runs
└── checkpoints/
    β”œβ”€β”€ README.md                  # archetype-level overview
    └── `rag_3b3b_canonical/`** β€” 75 runs, 573 checkpoints
    └── `rag_3b7b_asymmetric/`** β€” 13 runs, 143 checkpoints
    └── `rag_7b7b_indep/`** β€” 18 runs, 100 checkpoints
    └── `rag_7b7b_shared/`** β€” 11 runs, 54 checkpoints
    └── `verifier/`** β€” 6 runs, 36 checkpoints
```

Each `checkpoints/<archetype_dir>/<run_name>/` contains:
- `run_meta.json` β€” full provenance: hyperparameters, paper section, source path, schema version.
- `sft/` β€” adapter checkpoint after the SFT initialization phase.
- `rl_ep0/`, `rl_ep1/`, …, `rl_ep9/` β€” adapter checkpoint after each RL epoch.

For asymmetric / 7B+7B Independent runs, each checkpoint subdirectory contains separate `query_gen/` and `reader/` adapter directories. For shared-LoRA runs, one `adapter/` directory.

## How to load a checkpoint

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import snapshot_download

# Pick a run from MANIFEST.json
local = snapshot_download(
    repo_id="Sean13/role-drift-compound-systems",
    repo_type="dataset",
    allow_patterns=["checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/**",
                    "checkpoints/rag_7b7b_indep/<run_name>/run_meta.json"],
)

# For asymmetric/independent: load reader adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct",
                                            torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, f"{local}/checkpoints/rag_7b7b_indep/<run_name>/rl_ep9/reader")
```

## Provenance

Each `run_meta.json` records the original local path on the training cluster (`/home/xycao/compound_sys/...` or `/orcd/pool/008/.../legacy_home_checkpoints/...`). This allows full traceability back to the codebase commit.

## License

CC-BY-4.0. Cite the paper if you use these checkpoints.

## Reproduction

To reproduce these checkpoints from scratch, see the companion GitHub repository for the training code (`experiments/search/search_agent.py`, `experiments/verifier/verifier_rl.py`) and SLURM launchers (`jobs/launch_*.sh`).