Buckets:

UCL-CSSB
/

PlasmidRL-ICML

Files

xet

UCL-CSSB/PlasmidRL-ICML / README.md

McClain

5 days ago

preview code

download

raw

2.92 kB

	# UCL-CSSB/PlasmidRL-ICML

	Camera-ready artifacts for "Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators" (ICML 2026, to appear).

	See `INDEX.md` for the full per-folder navigation. See `SFT_STALE.md` for data-status flags. This README is the 30-second summary.

	## Headline results

	We apply Group Relative Policy Optimization (GRPO) to fine-tune `UCL-CSSB/PlasmidGPT` (Base) for whole-plasmid generation, evaluated across 8 prompts on 4,000 sequences each under analysis2 strict QC.

	\| Model \| T \| QC pass rate (8-prompt) \|
	\|---\|---:\|---:\|
	\| Base (`UCL-CSSB/PlasmidGPT`) \| 1.0 \| 4.275% \|
	\| SFT (`UCL-CSSB/PlasmidGPT-SFT`) \| 1.0 \| 10.975% \|
	\| RL = GRPO (`UCL-CSSB/PlasmidGPT-GRPO`) \| 1.0 \| 71.575% \|

	Lift: ~16.7× over Base, ~6.5× over SFT.

	Rejection sampling top-K (M=50 trials × 8 prompts):

	\| K \| Base \| SFT \| GRPO \|
	\|---:\|---:\|---:\|---:\|
	\| 1 \| 4.25% \| 9.75% \| 76.75% \|
	\| 4 \| 14.5% \| 36.25% \| 95.0% \|
	\| 16 \| 38.75% \| 76.25% \| 99.0% \|
	\| 64 \| 54.5% \| 99.25% \| 100% \|

	## Lineage (parallel post-training paths)

	```
	Base = UCL-CSSB/PlasmidGPT (= McClain/plasmidgpt-addgene-gpt2; same SHA, both public)
	├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf)
	└─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a)
	```

	Reward-component ablation models (`McClain/plasmidgpt-rl-{cds_only, length_only, no_cassette_bonus, no_length_prior, no_repeat_penalty}`) all branch from SFT.

	## Where to look

	- Per-claim sources — `INDEX.md` maps each paper Table/Figure to its bucket path
	- Continuation/surprisal benchmarks — `continuation_benchmark/eval_set_656/` (primary, 656 plasmids × 5 splits)
	- Rejection sampling — `rejection_topK/`, `rejection_v3/`, and the older `rejection_sampling_v2/` (Base+GRPO cells preserved; SFT cells moved to `deprecated/early_sft_checkpoint/` after model.safetensors fix)
	- MFE under DNA Mathews 2004 — `mfe/` with per-model + temperature-sweep folders
	- 8-prompt eval — `evaluation/eight_prompt/{Base, SFT, RL, ablations/...}/` with strict-QC artifacts
	- pLannotate ORI breakdown (Table 8 source) — `plannotate/RL/`
	- Reference panel — `reference/addgene_500/` (n=500)

	## Reproducibility

	- `models/pinned_shas.csv` — exact commit SHAs for the 8 surviving model repos
	- `code_snapshots/{PlasmidRL, analysis2, plasmid-rl-paper-2}.sha` — paper repo + analysis pipeline + training repo HEADs
	- Each per-cell `metadata.json` has the seed, sampling params, sha256 of outputs, and the analysis2 strict-QC pipeline name + thresholds
	- W&B training runs: `ucl-cssb/PlasmidRL` (Nov 2025 GRPO production) + `ucl-cssb/plasmid-rl-icml-revision` (March 2026 ablations)

	## License + citation

	Bucket data: CC-BY-4.0 (TBD — confirm before public release).
	Models: see individual repo cards.
	Citation: TBD on paper acceptance.

Xet Storage Details

Size:: 2.92 kB
Xet hash:: 1a837d23c12a7fe4e9996c95521ab75f2a2b99c387cea278b0941351e5898643

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.