Buckets:

McClain's picture
|
download
raw
9.28 kB
# UCL-CSSB/PlasmidRL-ICML — index of canonical artifacts
ICML camera-ready artifacts for *Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators*.
**Lineage** (parallel post-training paths from the same Base):
```
Base = UCL-CSSB/PlasmidGPT
├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf, post-2026-05-02 cleanup)
└─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a)
```
The 5 reward-ablation models branch from SFT, separately. McClain/PlasmidGPT-RL also branches from SFT — kept in `deprecated/early_rl_lineage/` as appendix material.
## Headline numbers (analysis2 strict QC, 8-prompt eval, T-matched)
| Model | T=1.0 (n=4000) | T=0.95 (n=4000) | rejection_v3 (n=10K, 8-prompt @ sweep-optimal T) |
|---|---:|---:|---:|
| Base | **4.275%** | — | 3.99% (T=1.0) |
| SFT | **10.975%** | — | 10.87% (T=1.0) |
| GRPO | **71.575%** | 66.875% | 78.66% (T=1.15) |
Headline lift (GRPO/Base @ T=1.0): ~16.7×.
## Layout
```
README.md interim
INDEX.md this file
SFT_STALE.md status flags
analysis/ truth-set CSVs + distribution metrics + plannotate
├── distribution_metrics.csv per-cell length/GC/ORF/JSD/Jaccard
├── distribution/per_seq_{Base,SFT,RL}.csv
├── distribution_report.html
└── (table1_*, table4_*, ... pending manifest build)
continuation_benchmark/ held-out continuation + surprisal benchmarks
├── eval_set_656/ 656 plasmids × 5 splits — primary eval
│ ├── summary.json
│ ├── per_split_{completion,surprisal}.csv
│ ├── per_plasmid_{completion,surprisal}.csv
│ ├── all_{completion,surprisal}.csv (window-level)
│ ├── full_set.fasta
│ ├── metadata.tsv
│ └── report.html
├── heldout_eng_r3/ PLSDB-style (F1-F6 NCBI queries) engineered held-out
│ ├── summary.json
│ ├── all_{completion,surprisal}.csv
│ └── report.html
├── both_metric_eval/ 47 archetype-matched (5 archetypes)
│ ├── summary.json
│ ├── joint_per_plasmid.csv
│ ├── per_plasmid_{completion,surprisal}.csv
│ ├── metadata.tsv
│ └── both_metric_candidates.fasta
├── validation_eval/ 80-plasmid regression test (3 strata)
│ ├── summary.json
│ ├── per_plasmid_{completion,surprisal}.csv
│ ├── metadata.tsv
│ └── validation_set.fasta
└── holdout30_non_addgene/ 29 curated non-Addgene
├── holdout30_non_addgene.csv
└── holdout30_non_addgene.fasta
evaluation/ generation outputs + QC
├── eight_prompt/{Base,SFT,RL}/ Table 1 sources at T=1.0
└── eight_prompt/ablations/{full_reward,5×reward_ablations}/ Table 7
mfe/ MFE under DNA Mathews 2004 params
├── Base/ n=4000 SFT-fixed Base, T=0.95 (paper original)
├── RL/ n=4000 GRPO @ T=1.0 (= old GRPO_temp1.0)
├── ablations/{...}/ 5 reward-ablation models
├── SFT_real/ n=4000 SFT @ T=1.0 (replaces stale mfe/SFT/) — mean −0.148
├── SFT_circ10k_subset/ 96 stratified, circular-folding for short seqs — mean −0.172
├── SFT_temp_sweep/ 200/T at T={0.5, 0.8, 0.95, 1.0, 1.15, 1.3}
├── RL_t1.15_8prompt/ GRPO @ T=1.15 (sweep-optimal) — mean −0.155
└── RL_temp_sweep_2prompt/ GRPO across T={0.5, ..., 1.3}, 2-prompt protocol
rejection_sampling_v2/ ORIGINAL paper Table 4 source (2-prompt, plasmidkit-loose QC)
├── direct/{Base,SFT,GRPO}/ SFT cell still uses pre-fix checkpoint — see SFT_STALE.md
└── best_of_16/{Base,SFT,GRPO}/
rejection_v3/ NEW — 8-prompt × 1250 = 10K, analysis2 strict QC
├── Base/ metadata.json (3.99%)
├── SFT/ metadata.json (10.87%)
└── GRPO/ metadata.json (78.66% @ T=1.15)
rejection_topK/ NEW — top-K-of-K sampling success rate (M=50 trials, 8 prompts)
├── summary.json success rate per (model, K∈{1,4,16,64})
├── success_per_model_K.csv
├── success_summary.csv per-prompt breakdown
├── diversity.csv Jaccard similarity of kept samples
├── ori_usage.csv ORI breakdown of kept samples
├── amr_usage.csv AMR breakdown of kept samples
├── per_attempt.csv trial-level data
├── kept_samples.csv + .fasta all selected samples
plannotate/ Table 8 — pLannotate-detected ORI breakdown
├── RL/ GRPO @ T=1.0
└── {Base_t0.95, SFT_t0.95}/ supplementary (T=0.95 versions)
novelty_blastn/summary.csv Table 2 (n=22/28/30 BLAST against Addgene)
reference/addgene_500/ Reference panel: plasmids.csv + metrics.csv + 3mer_freqs.json
models/pinned_shas.csv 8 model commit SHAs (SFT updated to daeaabf)
code_snapshots/ git SHAs (PlasmidRL, analysis2, plasmid-rl-paper-2)
manifests/ pending — paper_v2_camera_ready.json + deprecated.json
deprecated/ audit trail (v1 baselines, early RL lineage, old figures)
original_paper/ frozen pre-revision data
```
## Key new findings vs paper draft
1. **Headline lift is ~16.7×, not 2.7×** — from QC-pipeline tightening (analysis2 strict QC). The 71.6% RL number is unchanged; Base/SFT drop because the loose QC was overly permissive.
2. **Alignment tax on continuation logprob is real and replicated**: across 656/47/80-plasmid evals, RL is **−2 to −3 nats per window worse than SFT** on continuation. RL wins SFT in 0–12% of plasmids. Paper's "evidence too thin to claim alignment tax" should flip to "alignment tax is measurable; RL trades next-token prediction for QC pass rate".
3. **Lineage is parallel, not serial**: GRPO trained from Base, not from SFT. Abstract / §3.2 framing of "RL preserves the SFT-induced thermodynamic manifold" needs rewording — RL didn't inherit what it never saw. Both SFT and GRPO independently land near real-plasmid MFE / JSD / ORF length via different mechanisms.
4. **SFT generates plasmid-scale sequences** (mean length 5,441 bp, ORF 272 aa) — old SFT data showed Base-like 1,970 bp due to the model.safetensors checkpoint dispatch issue. SFT's MFE of −0.148 happens to match the paper's reported −0.149.
5. **Diversity convention**: the paper uses Jaccard *distance* (~1.0 = diverse). New `distribution_metrics.csv` reports Jaccard *similarity* (low = diverse). Convert via `distance = 1 − similarity` when reading. Paper's "RL diversity 0.573" = new "RL Jaccard similarity 0.426". Same signal.
## Ablation table (T=1.15, post-2026-05-05)
`evaluation/eight_prompt/ablations/manifest.json` is the source of truth. Pass rates at sweep-optimal T=1.15 (matching rejection-sampling protocol):
| Ablation | Pass% | MFE (kcal/mol/bp) | Diversity (1−Jaccard) | T=0.95 pass% (deprecated) |
|---|---:|---:|---:|---:|
| full_reward | **78.35%** | −0.165 | 0.585 | 66.88% |
| no_repeat_penalty | 75.15% | −0.151 | 0.419 | 72.17% |
| no_length_prior | 72.15% | −0.140 | 0.459 | 71.38% |
| no_cassette_bonus | 44.52% | −0.170 | 0.369 | 19.80% |
| length_only | 37.90% | −0.131 | 0.772 | 34.73% |
| cds_only | 1.73% | −0.130 | 0.861 | 2.40% |
| **Addgene baseline** | — | — | **0.925** | — |
- Pass rate from `evaluation/eight_prompt/ablations/{cell}/qc/qc_summary.csv` (n=4000 generations per cell, analysis2 strict QC).
- MFE from `evaluation/eight_prompt/ablations/{cell}/mfe/mfe_summary.json` (n=200 random subset per cell at T=1.15; cds_only n=69 because only 69 sequences pass strict QC; computed with ViennaRNA 2.7.2 / Mathews 2004 DNA params, 1000 bp window).
- Diversity: 1 − mean pairwise 21-mer Jaccard similarity (n=200 sampled from passing sequences, except cds_only n=69). Addgene baseline (n=200 from reference) = 0.9245.
- Per-cell `metadata.json` carries seed, sampling params, model SHA, and sha256 of every output file.
Cassette-bonus removal is still the largest single-component drop (78.35 → 44.52 = 33.8 pp). T=0.95 data preserved at `deprecated/ablations_t0.95/` (bucket-state archive) and `deprecated/ablations_t0.95_source/` (strict-QC source files at T=0.95).
## Pending decisions
- [ ] Table 4 protocol: keep paper's 2-prompt v2 (plasmidkit-loose QC) or switch to 8-prompt rejection_v3 (analysis2 strict QC)?
- [ ] Table 5 source: smaller 11-plasmid (paper-original numbers reproduce) or larger 656-plasmid eval_set?
- [ ] MFE protocol: 8-prompt full or 2-prompt protocol for the camera-ready Table 6 row?
- [ ] Table 7 row 1 number: 66.875% (GRPO @ T=0.95) recommended; awaiting final paper-text confirmation.

Xet Storage Details

Size:
9.28 kB
·
Xet hash:
f93c3eca349aa95cb74d9e696536b14750503a56a9aa5bd51c39c000d275252c

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.