Buckets:
| # UCL-CSSB/PlasmidRL-ICML | |
| Camera-ready artifacts for **"Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators"** (ICML 2026, to appear). | |
| See `INDEX.md` for the full per-folder navigation. See `SFT_STALE.md` for data-status flags. This README is the 30-second summary. | |
| ## Headline results | |
| We apply Group Relative Policy Optimization (GRPO) to fine-tune `UCL-CSSB/PlasmidGPT` (Base) for whole-plasmid generation, evaluated across 8 prompts on 4,000 sequences each under analysis2 strict QC. | |
| | Model | T | QC pass rate (8-prompt) | | |
| |---|---:|---:| | |
| | Base (`UCL-CSSB/PlasmidGPT`) | 1.0 | **4.275%** | | |
| | SFT (`UCL-CSSB/PlasmidGPT-SFT`) | 1.0 | **10.975%** | | |
| | RL = GRPO (`UCL-CSSB/PlasmidGPT-GRPO`) | 1.0 | **71.575%** | | |
| Lift: ~16.7× over Base, ~6.5× over SFT. | |
| Rejection sampling top-K (M=50 trials × 8 prompts): | |
| | K | Base | SFT | GRPO | | |
| |---:|---:|---:|---:| | |
| | 1 | 4.25% | 9.75% | **76.75%** | | |
| | 4 | 14.5% | 36.25% | **95.0%** | | |
| | 16 | 38.75% | 76.25% | **99.0%** | | |
| | 64 | 54.5% | 99.25% | **100%** | | |
| ## Lineage (parallel post-training paths) | |
| ``` | |
| Base = UCL-CSSB/PlasmidGPT (= McClain/plasmidgpt-addgene-gpt2; same SHA, both public) | |
| ├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf) | |
| └─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a) | |
| ``` | |
| Reward-component ablation models (`McClain/plasmidgpt-rl-{cds_only, length_only, no_cassette_bonus, no_length_prior, no_repeat_penalty}`) all branch from SFT. | |
| ## Where to look | |
| - **Per-claim sources** — `INDEX.md` maps each paper Table/Figure to its bucket path | |
| - **Continuation/surprisal benchmarks** — `continuation_benchmark/eval_set_656/` (primary, 656 plasmids × 5 splits) | |
| - **Rejection sampling** — `rejection_topK/`, `rejection_v3/`, and the older `rejection_sampling_v2/` (Base+GRPO cells preserved; SFT cells moved to `deprecated/early_sft_checkpoint/` after model.safetensors fix) | |
| - **MFE under DNA Mathews 2004** — `mfe/` with per-model + temperature-sweep folders | |
| - **8-prompt eval** — `evaluation/eight_prompt/{Base, SFT, RL, ablations/...}/` with strict-QC artifacts | |
| - **pLannotate ORI breakdown (Table 8 source)** — `plannotate/RL/` | |
| - **Reference panel** — `reference/addgene_500/` (n=500) | |
| ## Reproducibility | |
| - `models/pinned_shas.csv` — exact commit SHAs for the 8 surviving model repos | |
| - `code_snapshots/{PlasmidRL, analysis2, plasmid-rl-paper-2}.sha` — paper repo + analysis pipeline + training repo HEADs | |
| - Each per-cell `metadata.json` has the seed, sampling params, sha256 of outputs, and the analysis2 strict-QC pipeline name + thresholds | |
| - W&B training runs: `ucl-cssb/PlasmidRL` (Nov 2025 GRPO production) + `ucl-cssb/plasmid-rl-icml-revision` (March 2026 ablations) | |
| ## License + citation | |
| Bucket data: CC-BY-4.0 (TBD — confirm before public release). | |
| Models: see individual repo cards. | |
| Citation: TBD on paper acceptance. | |
Xet Storage Details
- Size:
- 2.92 kB
- Xet hash:
- 1a837d23c12a7fe4e9996c95521ab75f2a2b99c387cea278b0941351e5898643
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.