Buckets:
| # UCL-CSSB/PlasmidRL-ICML — index of canonical artifacts | |
| ICML camera-ready artifacts for *Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators*. | |
| **Lineage** (parallel post-training paths from the same Base): | |
| ``` | |
| Base = UCL-CSSB/PlasmidGPT | |
| ├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf, post-2026-05-02 cleanup) | |
| └─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a) | |
| ``` | |
| The 5 reward-ablation models branch from SFT, separately. McClain/PlasmidGPT-RL also branches from SFT — kept in `deprecated/early_rl_lineage/` as appendix material. | |
| ## Headline numbers (analysis2 strict QC, 8-prompt eval, T-matched) | |
| | Model | T=1.0 (n=4000) | T=0.95 (n=4000) | rejection_v3 (n=10K, 8-prompt @ sweep-optimal T) | | |
| |---|---:|---:|---:| | |
| | Base | **4.275%** | — | 3.99% (T=1.0) | | |
| | SFT | **10.975%** | — | 10.87% (T=1.0) | | |
| | GRPO | **71.575%** | 66.875% | 78.66% (T=1.15) | | |
| Headline lift (GRPO/Base @ T=1.0): ~16.7×. | |
| ## Layout | |
| ``` | |
| README.md interim | |
| INDEX.md this file | |
| SFT_STALE.md status flags | |
| analysis/ truth-set CSVs + distribution metrics + plannotate | |
| ├── distribution_metrics.csv per-cell length/GC/ORF/JSD/Jaccard | |
| ├── distribution/per_seq_{Base,SFT,RL}.csv | |
| ├── distribution_report.html | |
| └── (table1_*, table4_*, ... pending manifest build) | |
| continuation_benchmark/ held-out continuation + surprisal benchmarks | |
| ├── eval_set_656/ 656 plasmids × 5 splits — primary eval | |
| │ ├── summary.json | |
| │ ├── per_split_{completion,surprisal}.csv | |
| │ ├── per_plasmid_{completion,surprisal}.csv | |
| │ ├── all_{completion,surprisal}.csv (window-level) | |
| │ ├── full_set.fasta | |
| │ ├── metadata.tsv | |
| │ └── report.html | |
| ├── heldout_eng_r3/ PLSDB-style (F1-F6 NCBI queries) engineered held-out | |
| │ ├── summary.json | |
| │ ├── all_{completion,surprisal}.csv | |
| │ └── report.html | |
| ├── both_metric_eval/ 47 archetype-matched (5 archetypes) | |
| │ ├── summary.json | |
| │ ├── joint_per_plasmid.csv | |
| │ ├── per_plasmid_{completion,surprisal}.csv | |
| │ ├── metadata.tsv | |
| │ └── both_metric_candidates.fasta | |
| ├── validation_eval/ 80-plasmid regression test (3 strata) | |
| │ ├── summary.json | |
| │ ├── per_plasmid_{completion,surprisal}.csv | |
| │ ├── metadata.tsv | |
| │ └── validation_set.fasta | |
| └── holdout30_non_addgene/ 29 curated non-Addgene | |
| ├── holdout30_non_addgene.csv | |
| └── holdout30_non_addgene.fasta | |
| evaluation/ generation outputs + QC | |
| ├── eight_prompt/{Base,SFT,RL}/ Table 1 sources at T=1.0 | |
| └── eight_prompt/ablations/{full_reward,5×reward_ablations}/ Table 7 | |
| mfe/ MFE under DNA Mathews 2004 params | |
| ├── Base/ n=4000 SFT-fixed Base, T=0.95 (paper original) | |
| ├── RL/ n=4000 GRPO @ T=1.0 (= old GRPO_temp1.0) | |
| ├── ablations/{...}/ 5 reward-ablation models | |
| ├── SFT_real/ n=4000 SFT @ T=1.0 (replaces stale mfe/SFT/) — mean −0.148 | |
| ├── SFT_circ10k_subset/ 96 stratified, circular-folding for short seqs — mean −0.172 | |
| ├── SFT_temp_sweep/ 200/T at T={0.5, 0.8, 0.95, 1.0, 1.15, 1.3} | |
| ├── RL_t1.15_8prompt/ GRPO @ T=1.15 (sweep-optimal) — mean −0.155 | |
| └── RL_temp_sweep_2prompt/ GRPO across T={0.5, ..., 1.3}, 2-prompt protocol | |
| rejection_sampling_v2/ ORIGINAL paper Table 4 source (2-prompt, plasmidkit-loose QC) | |
| ├── direct/{Base,SFT,GRPO}/ SFT cell still uses pre-fix checkpoint — see SFT_STALE.md | |
| └── best_of_16/{Base,SFT,GRPO}/ | |
| rejection_v3/ NEW — 8-prompt × 1250 = 10K, analysis2 strict QC | |
| ├── Base/ metadata.json (3.99%) | |
| ├── SFT/ metadata.json (10.87%) | |
| └── GRPO/ metadata.json (78.66% @ T=1.15) | |
| rejection_topK/ NEW — top-K-of-K sampling success rate (M=50 trials, 8 prompts) | |
| ├── summary.json success rate per (model, K∈{1,4,16,64}) | |
| ├── success_per_model_K.csv | |
| ├── success_summary.csv per-prompt breakdown | |
| ├── diversity.csv Jaccard similarity of kept samples | |
| ├── ori_usage.csv ORI breakdown of kept samples | |
| ├── amr_usage.csv AMR breakdown of kept samples | |
| ├── per_attempt.csv trial-level data | |
| ├── kept_samples.csv + .fasta all selected samples | |
| plannotate/ Table 8 — pLannotate-detected ORI breakdown | |
| ├── RL/ GRPO @ T=1.0 | |
| └── {Base_t0.95, SFT_t0.95}/ supplementary (T=0.95 versions) | |
| novelty_blastn/summary.csv Table 2 (n=22/28/30 BLAST against Addgene) | |
| reference/addgene_500/ Reference panel: plasmids.csv + metrics.csv + 3mer_freqs.json | |
| models/pinned_shas.csv 8 model commit SHAs (SFT updated to daeaabf) | |
| code_snapshots/ git SHAs (PlasmidRL, analysis2, plasmid-rl-paper-2) | |
| manifests/ pending — paper_v2_camera_ready.json + deprecated.json | |
| deprecated/ audit trail (v1 baselines, early RL lineage, old figures) | |
| original_paper/ frozen pre-revision data | |
| ``` | |
| ## Key new findings vs paper draft | |
| 1. **Headline lift is ~16.7×, not 2.7×** — from QC-pipeline tightening (analysis2 strict QC). The 71.6% RL number is unchanged; Base/SFT drop because the loose QC was overly permissive. | |
| 2. **Alignment tax on continuation logprob is real and replicated**: across 656/47/80-plasmid evals, RL is **−2 to −3 nats per window worse than SFT** on continuation. RL wins SFT in 0–12% of plasmids. Paper's "evidence too thin to claim alignment tax" should flip to "alignment tax is measurable; RL trades next-token prediction for QC pass rate". | |
| 3. **Lineage is parallel, not serial**: GRPO trained from Base, not from SFT. Abstract / §3.2 framing of "RL preserves the SFT-induced thermodynamic manifold" needs rewording — RL didn't inherit what it never saw. Both SFT and GRPO independently land near real-plasmid MFE / JSD / ORF length via different mechanisms. | |
| 4. **SFT generates plasmid-scale sequences** (mean length 5,441 bp, ORF 272 aa) — old SFT data showed Base-like 1,970 bp due to the model.safetensors checkpoint dispatch issue. SFT's MFE of −0.148 happens to match the paper's reported −0.149. | |
| 5. **Diversity convention**: the paper uses Jaccard *distance* (~1.0 = diverse). New `distribution_metrics.csv` reports Jaccard *similarity* (low = diverse). Convert via `distance = 1 − similarity` when reading. Paper's "RL diversity 0.573" = new "RL Jaccard similarity 0.426". Same signal. | |
| ## Ablation table (T=1.15, post-2026-05-05) | |
| `evaluation/eight_prompt/ablations/manifest.json` is the source of truth. Pass rates at sweep-optimal T=1.15 (matching rejection-sampling protocol): | |
| | Ablation | Pass% | MFE (kcal/mol/bp) | Diversity (1−Jaccard) | T=0.95 pass% (deprecated) | | |
| |---|---:|---:|---:|---:| | |
| | full_reward | **78.35%** | −0.165 | 0.585 | 66.88% | | |
| | no_repeat_penalty | 75.15% | −0.151 | 0.419 | 72.17% | | |
| | no_length_prior | 72.15% | −0.140 | 0.459 | 71.38% | | |
| | no_cassette_bonus | 44.52% | −0.170 | 0.369 | 19.80% | | |
| | length_only | 37.90% | −0.131 | 0.772 | 34.73% | | |
| | cds_only | 1.73% | −0.130 | 0.861 | 2.40% | | |
| | **Addgene baseline** | — | — | **0.925** | — | | |
| - Pass rate from `evaluation/eight_prompt/ablations/{cell}/qc/qc_summary.csv` (n=4000 generations per cell, analysis2 strict QC). | |
| - MFE from `evaluation/eight_prompt/ablations/{cell}/mfe/mfe_summary.json` (n=200 random subset per cell at T=1.15; cds_only n=69 because only 69 sequences pass strict QC; computed with ViennaRNA 2.7.2 / Mathews 2004 DNA params, 1000 bp window). | |
| - Diversity: 1 − mean pairwise 21-mer Jaccard similarity (n=200 sampled from passing sequences, except cds_only n=69). Addgene baseline (n=200 from reference) = 0.9245. | |
| - Per-cell `metadata.json` carries seed, sampling params, model SHA, and sha256 of every output file. | |
| Cassette-bonus removal is still the largest single-component drop (78.35 → 44.52 = 33.8 pp). T=0.95 data preserved at `deprecated/ablations_t0.95/` (bucket-state archive) and `deprecated/ablations_t0.95_source/` (strict-QC source files at T=0.95). | |
| ## Pending decisions | |
| - [ ] Table 4 protocol: keep paper's 2-prompt v2 (plasmidkit-loose QC) or switch to 8-prompt rejection_v3 (analysis2 strict QC)? | |
| - [ ] Table 5 source: smaller 11-plasmid (paper-original numbers reproduce) or larger 656-plasmid eval_set? | |
| - [ ] MFE protocol: 8-prompt full or 2-prompt protocol for the camera-ready Table 6 row? | |
| - [ ] Table 7 row 1 number: 66.875% (GRPO @ T=0.95) recommended; awaiting final paper-text confirmation. | |
Xet Storage Details
- Size:
- 9.28 kB
- Xet hash:
- f93c3eca349aa95cb74d9e696536b14750503a56a9aa5bd51c39c000d275252c
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.