Buckets:

UCL-CSSB
/

PlasmidRL-ICML

Files

xet

UCL-CSSB/PlasmidRL-ICML / README.md

McClain

5 days ago

preview code

download

raw

2.92 kB

UCL-CSSB/PlasmidRL-ICML

Camera-ready artifacts for "Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators" (ICML 2026, to appear).

See INDEX.md for the full per-folder navigation. See SFT_STALE.md for data-status flags. This README is the 30-second summary.

Headline results

We apply Group Relative Policy Optimization (GRPO) to fine-tune UCL-CSSB/PlasmidGPT (Base) for whole-plasmid generation, evaluated across 8 prompts on 4,000 sequences each under analysis2 strict QC.

Model	T	QC pass rate (8-prompt)
Base (`UCL-CSSB/PlasmidGPT`)	1.0	4.275%
SFT (`UCL-CSSB/PlasmidGPT-SFT`)	1.0	10.975%
RL = GRPO (`UCL-CSSB/PlasmidGPT-GRPO`)	1.0	71.575%

Lift: ~16.7× over Base, ~6.5× over SFT.

Rejection sampling top-K (M=50 trials × 8 prompts):

K	Base	SFT	GRPO
1	4.25%	9.75%	76.75%
4	14.5%	36.25%	95.0%
16	38.75%	76.25%	99.0%
64	54.5%	99.25%	100%

Lineage (parallel post-training paths)

Base = UCL-CSSB/PlasmidGPT  (= McClain/plasmidgpt-addgene-gpt2; same SHA, both public)
├─→ SFT next-token loss      → UCL-CSSB/PlasmidGPT-SFT  (sha daeaabf)
└─→ GRPO reward shaping       → UCL-CSSB/PlasmidGPT-GRPO  (sha db2462a)

Reward-component ablation models (McClain/plasmidgpt-rl-{cds_only, length_only, no_cassette_bonus, no_length_prior, no_repeat_penalty}) all branch from SFT.

Where to look

Per-claim sources — INDEX.md maps each paper Table/Figure to its bucket path
Continuation/surprisal benchmarks — continuation_benchmark/eval_set_656/ (primary, 656 plasmids × 5 splits)
Rejection sampling — rejection_topK/, rejection_v3/, and the older rejection_sampling_v2/ (Base+GRPO cells preserved; SFT cells moved to deprecated/early_sft_checkpoint/ after model.safetensors fix)
MFE under DNA Mathews 2004 — mfe/ with per-model + temperature-sweep folders
8-prompt eval — evaluation/eight_prompt/{Base, SFT, RL, ablations/...}/ with strict-QC artifacts
pLannotate ORI breakdown (Table 8 source) — plannotate/RL/
Reference panel — reference/addgene_500/ (n=500)

Reproducibility

models/pinned_shas.csv — exact commit SHAs for the 8 surviving model repos
code_snapshots/{PlasmidRL, analysis2, plasmid-rl-paper-2}.sha — paper repo + analysis pipeline + training repo HEADs
Each per-cell metadata.json has the seed, sampling params, sha256 of outputs, and the analysis2 strict-QC pipeline name + thresholds
W&B training runs: ucl-cssb/PlasmidRL (Nov 2025 GRPO production) + ucl-cssb/plasmid-rl-icml-revision (March 2026 ablations)

License + citation

Bucket data: CC-BY-4.0 (TBD — confirm before public release). Models: see individual repo cards. Citation: TBD on paper acceptance.

Xet Storage Details

Size:: 2.92 kB
Xet hash:: 1a837d23c12a7fe4e9996c95521ab75f2a2b99c387cea278b0941351e5898643

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.