Buckets:

UCL-CSSB
/

PlasmidRL-ICML

Files

xet

UCL-CSSB/PlasmidRL-ICML / INDEX.md

McClain

4 days ago

preview code

download

raw

9.28 kB

	# UCL-CSSB/PlasmidRL-ICML — index of canonical artifacts

	ICML camera-ready artifacts for Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators.

	Lineage (parallel post-training paths from the same Base):

	```
	Base = UCL-CSSB/PlasmidGPT
	├─→ SFT next-token loss → UCL-CSSB/PlasmidGPT-SFT (sha daeaabf, post-2026-05-02 cleanup)
	└─→ GRPO reward shaping → UCL-CSSB/PlasmidGPT-GRPO (sha db2462a)
	```

	The 5 reward-ablation models branch from SFT, separately. McClain/PlasmidGPT-RL also branches from SFT — kept in `deprecated/early_rl_lineage/` as appendix material.

	## Headline numbers (analysis2 strict QC, 8-prompt eval, T-matched)

	\| Model \| T=1.0 (n=4000) \| T=0.95 (n=4000) \| rejection_v3 (n=10K, 8-prompt @ sweep-optimal T) \|
	\|---\|---:\|---:\|---:\|
	\| Base \| 4.275% \| — \| 3.99% (T=1.0) \|
	\| SFT \| 10.975% \| — \| 10.87% (T=1.0) \|
	\| GRPO \| 71.575% \| 66.875% \| 78.66% (T=1.15) \|

	Headline lift (GRPO/Base @ T=1.0): ~16.7×.

	## Layout

	```
	README.md interim
	INDEX.md this file
	SFT_STALE.md status flags

	analysis/ truth-set CSVs + distribution metrics + plannotate
	├── distribution_metrics.csv per-cell length/GC/ORF/JSD/Jaccard
	├── distribution/per_seq_{Base,SFT,RL}.csv
	├── distribution_report.html
	└── (table1_, table4_, ... pending manifest build)

	continuation_benchmark/ held-out continuation + surprisal benchmarks
	├── eval_set_656/ 656 plasmids × 5 splits — primary eval
	│ ├── summary.json
	│ ├── per_split_{completion,surprisal}.csv
	│ ├── per_plasmid_{completion,surprisal}.csv
	│ ├── all_{completion,surprisal}.csv (window-level)
	│ ├── full_set.fasta
	│ ├── metadata.tsv
	│ └── report.html
	├── heldout_eng_r3/ PLSDB-style (F1-F6 NCBI queries) engineered held-out
	│ ├── summary.json
	│ ├── all_{completion,surprisal}.csv
	│ └── report.html
	├── both_metric_eval/ 47 archetype-matched (5 archetypes)
	│ ├── summary.json
	│ ├── joint_per_plasmid.csv
	│ ├── per_plasmid_{completion,surprisal}.csv
	│ ├── metadata.tsv
	│ └── both_metric_candidates.fasta
	├── validation_eval/ 80-plasmid regression test (3 strata)
	│ ├── summary.json
	│ ├── per_plasmid_{completion,surprisal}.csv
	│ ├── metadata.tsv
	│ └── validation_set.fasta
	└── holdout30_non_addgene/ 29 curated non-Addgene
	├── holdout30_non_addgene.csv
	└── holdout30_non_addgene.fasta

	evaluation/ generation outputs + QC
	├── eight_prompt/{Base,SFT,RL}/ Table 1 sources at T=1.0
	└── eight_prompt/ablations/{full_reward,5×reward_ablations}/ Table 7

	mfe/ MFE under DNA Mathews 2004 params
	├── Base/ n=4000 SFT-fixed Base, T=0.95 (paper original)
	├── RL/ n=4000 GRPO @ T=1.0 (= old GRPO_temp1.0)
	├── ablations/{...}/ 5 reward-ablation models
	├── SFT_real/ n=4000 SFT @ T=1.0 (replaces stale mfe/SFT/) — mean −0.148
	├── SFT_circ10k_subset/ 96 stratified, circular-folding for short seqs — mean −0.172
	├── SFT_temp_sweep/ 200/T at T={0.5, 0.8, 0.95, 1.0, 1.15, 1.3}
	├── RL_t1.15_8prompt/ GRPO @ T=1.15 (sweep-optimal) — mean −0.155
	└── RL_temp_sweep_2prompt/ GRPO across T={0.5, ..., 1.3}, 2-prompt protocol

	rejection_sampling_v2/ ORIGINAL paper Table 4 source (2-prompt, plasmidkit-loose QC)
	├── direct/{Base,SFT,GRPO}/ SFT cell still uses pre-fix checkpoint — see SFT_STALE.md
	└── best_of_16/{Base,SFT,GRPO}/

	rejection_v3/ NEW — 8-prompt × 1250 = 10K, analysis2 strict QC
	├── Base/ metadata.json (3.99%)
	├── SFT/ metadata.json (10.87%)
	└── GRPO/ metadata.json (78.66% @ T=1.15)

	rejection_topK/ NEW — top-K-of-K sampling success rate (M=50 trials, 8 prompts)
	├── summary.json success rate per (model, K∈{1,4,16,64})
	├── success_per_model_K.csv
	├── success_summary.csv per-prompt breakdown
	├── diversity.csv Jaccard similarity of kept samples
	├── ori_usage.csv ORI breakdown of kept samples
	├── amr_usage.csv AMR breakdown of kept samples
	├── per_attempt.csv trial-level data
	├── kept_samples.csv + .fasta all selected samples

	plannotate/ Table 8 — pLannotate-detected ORI breakdown
	├── RL/ GRPO @ T=1.0
	└── {Base_t0.95, SFT_t0.95}/ supplementary (T=0.95 versions)

	novelty_blastn/summary.csv Table 2 (n=22/28/30 BLAST against Addgene)

	reference/addgene_500/ Reference panel: plasmids.csv + metrics.csv + 3mer_freqs.json

	models/pinned_shas.csv 8 model commit SHAs (SFT updated to daeaabf)

	code_snapshots/ git SHAs (PlasmidRL, analysis2, plasmid-rl-paper-2)

	manifests/ pending — paper_v2_camera_ready.json + deprecated.json

	deprecated/ audit trail (v1 baselines, early RL lineage, old figures)

	original_paper/ frozen pre-revision data
	```

	## Key new findings vs paper draft

	1. Headline lift is ~16.7×, not 2.7× — from QC-pipeline tightening (analysis2 strict QC). The 71.6% RL number is unchanged; Base/SFT drop because the loose QC was overly permissive.

	2. Alignment tax on continuation logprob is real and replicated: across 656/47/80-plasmid evals, RL is −2 to −3 nats per window worse than SFT on continuation. RL wins SFT in 0–12% of plasmids. Paper's "evidence too thin to claim alignment tax" should flip to "alignment tax is measurable; RL trades next-token prediction for QC pass rate".

	3. Lineage is parallel, not serial: GRPO trained from Base, not from SFT. Abstract / §3.2 framing of "RL preserves the SFT-induced thermodynamic manifold" needs rewording — RL didn't inherit what it never saw. Both SFT and GRPO independently land near real-plasmid MFE / JSD / ORF length via different mechanisms.

	4. SFT generates plasmid-scale sequences (mean length 5,441 bp, ORF 272 aa) — old SFT data showed Base-like 1,970 bp due to the model.safetensors checkpoint dispatch issue. SFT's MFE of −0.148 happens to match the paper's reported −0.149.

	5. Diversity convention: the paper uses Jaccard distance (~1.0 = diverse). New `distribution_metrics.csv` reports Jaccard similarity (low = diverse). Convert via `distance = 1 − similarity` when reading. Paper's "RL diversity 0.573" = new "RL Jaccard similarity 0.426". Same signal.



	## Ablation table (T=1.15, post-2026-05-05)

	`evaluation/eight_prompt/ablations/manifest.json` is the source of truth. Pass rates at sweep-optimal T=1.15 (matching rejection-sampling protocol):

	\| Ablation \| Pass% \| MFE (kcal/mol/bp) \| Diversity (1−Jaccard) \| T=0.95 pass% (deprecated) \|
	\|---\|---:\|---:\|---:\|---:\|
	\| full_reward \| 78.35% \| −0.165 \| 0.585 \| 66.88% \|
	\| no_repeat_penalty \| 75.15% \| −0.151 \| 0.419 \| 72.17% \|
	\| no_length_prior \| 72.15% \| −0.140 \| 0.459 \| 71.38% \|
	\| no_cassette_bonus \| 44.52% \| −0.170 \| 0.369 \| 19.80% \|
	\| length_only \| 37.90% \| −0.131 \| 0.772 \| 34.73% \|
	\| cds_only \| 1.73% \| −0.130 \| 0.861 \| 2.40% \|
	\| Addgene baseline \| — \| — \| 0.925 \| — \|

	- Pass rate from `evaluation/eight_prompt/ablations/{cell}/qc/qc_summary.csv` (n=4000 generations per cell, analysis2 strict QC).
	- MFE from `evaluation/eight_prompt/ablations/{cell}/mfe/mfe_summary.json` (n=200 random subset per cell at T=1.15; cds_only n=69 because only 69 sequences pass strict QC; computed with ViennaRNA 2.7.2 / Mathews 2004 DNA params, 1000 bp window).
	- Diversity: 1 − mean pairwise 21-mer Jaccard similarity (n=200 sampled from passing sequences, except cds_only n=69). Addgene baseline (n=200 from reference) = 0.9245.
	- Per-cell `metadata.json` carries seed, sampling params, model SHA, and sha256 of every output file.

	Cassette-bonus removal is still the largest single-component drop (78.35 → 44.52 = 33.8 pp). T=0.95 data preserved at `deprecated/ablations_t0.95/` (bucket-state archive) and `deprecated/ablations_t0.95_source/` (strict-QC source files at T=0.95).

	## Pending decisions

	- [ ] Table 4 protocol: keep paper's 2-prompt v2 (plasmidkit-loose QC) or switch to 8-prompt rejection_v3 (analysis2 strict QC)?
	- [ ] Table 5 source: smaller 11-plasmid (paper-original numbers reproduce) or larger 656-plasmid eval_set?
	- [ ] MFE protocol: 8-prompt full or 2-prompt protocol for the camera-ready Table 6 row?
	- [ ] Table 7 row 1 number: 66.875% (GRPO @ T=0.95) recommended; awaiting final paper-text confirmation.

Xet Storage Details

Size:: 9.28 kB
Xet hash:: f93c3eca349aa95cb74d9e696536b14750503a56a9aa5bd51c39c000d275252c

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.