Buckets:
UCL-CSSB/PlasmidRL-ICML
Camera-ready artifacts for "Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators" (ICML 2026, to appear).
See INDEX.md for the full per-folder navigation. See SFT_STALE.md for data-status flags. This README is the 30-second summary.
Headline results
We apply Group Relative Policy Optimization (GRPO) to fine-tune UCL-CSSB/PlasmidGPT (Base) for whole-plasmid generation, evaluated across 8 prompts on 4,000 sequences each under analysis2 strict QC.
| Model | T | QC pass rate (8-prompt) |
|---|---|---|
Base (UCL-CSSB/PlasmidGPT) |
1.0 | 4.275% |
SFT (UCL-CSSB/PlasmidGPT-SFT) |
1.0 | 10.975% |
RL = GRPO (UCL-CSSB/PlasmidGPT-GRPO) |
1.0 | 71.575% |
Lift: ~16.7× over Base, ~6.5× over SFT.
Rejection sampling top-K (M=50 trials × 8 prompts):
| K | Base | SFT | GRPO |
|---|---|---|---|
| 1 | 4.25% | 9.75% | 76.75% |
| 4 | 14.5% | 36.25% | 95.0% |
| 16 | 38.75% | 76.25% | 99.0% |
| 64 | 54.5% | 99.25% | 100% |
Lineage (parallel post-training paths)
Base = UCL-CSSB/PlasmidGPT (= McClain/plasmidgpt-addgene-gpt2; same SHA, both public)
├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf)
└─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a)
Reward-component ablation models (McClain/plasmidgpt-rl-{cds_only, length_only, no_cassette_bonus, no_length_prior, no_repeat_penalty}) all branch from SFT.
Where to look
- Per-claim sources —
INDEX.mdmaps each paper Table/Figure to its bucket path - Continuation/surprisal benchmarks —
continuation_benchmark/eval_set_656/(primary, 656 plasmids × 5 splits) - Rejection sampling —
rejection_topK/,rejection_v3/, and the olderrejection_sampling_v2/(Base+GRPO cells preserved; SFT cells moved todeprecated/early_sft_checkpoint/after model.safetensors fix) - MFE under DNA Mathews 2004 —
mfe/with per-model + temperature-sweep folders - 8-prompt eval —
evaluation/eight_prompt/{Base, SFT, RL, ablations/...}/with strict-QC artifacts - pLannotate ORI breakdown (Table 8 source) —
plannotate/RL/ - Reference panel —
reference/addgene_500/(n=500)
Reproducibility
models/pinned_shas.csv— exact commit SHAs for the 8 surviving model reposcode_snapshots/{PlasmidRL, analysis2, plasmid-rl-paper-2}.sha— paper repo + analysis pipeline + training repo HEADs- Each per-cell
metadata.jsonhas the seed, sampling params, sha256 of outputs, and the analysis2 strict-QC pipeline name + thresholds - W&B training runs:
ucl-cssb/PlasmidRL(Nov 2025 GRPO production) +ucl-cssb/plasmid-rl-icml-revision(March 2026 ablations)
License + citation
Bucket data: CC-BY-4.0 (TBD — confirm before public release). Models: see individual repo cards. Citation: TBD on paper acceptance.
Xet Storage Details
- Size:
- 2.92 kB
- Xet hash:
- 1a837d23c12a7fe4e9996c95521ab75f2a2b99c387cea278b0941351e5898643
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.