Buckets:

UCL-CSSB
/

PlasmidRL-ICML

Files

xet

UCL-CSSB/PlasmidRL-ICML / SFT_STALE.md

McClain

7 days ago

preview code

download

raw

2.78 kB

	# Bucket data status — what's verified, what's pending

	Issue (root-caused 2026-05-02): HF repo `UCL-CSSB/PlasmidGPT-SFT` had two safetensors files; `AutoModelForCausalLM.from_pretrained` defaulted to a Base-clone duplicate. Cleanup commit `daeaabf` swapped them. SHA changed from `5748e6f9` → `daeaabf0`. `models/pinned_shas.csv` is updated.

	## Verified clean (post-2026-05-04 audit)

	- `evaluation/eight_prompt/{Base,SFT,RL}/` — analysis2 strict QC; 4.275 / 10.975 / 71.575
	- `evaluation/eight_prompt/ablations/full_reward/` — GRPO @ T=0.95 = 66.875%
	- `evaluation/eight_prompt/ablations/{cds_only,length_only,no_cassette_bonus,no_length_prior,no_repeat_penalty}/` — Table 7 rows 2-6
	- `analysis/distribution_metrics.csv` + `analysis/distribution/per_seq_*.csv` — Table 6 source
	- `continuation_benchmark/eval_set_656/` — 656 plasmids × 5 splits (PRIMARY Table 5 source)
	- `continuation_benchmark/heldout_eng_r3/` — PLSDB-style F1–F6 NCBI queries
	- `continuation_benchmark/{both_metric_eval, validation_eval, holdout30_non_addgene}/` — additional held-out evals
	- `mfe/SFT_real/` — replaces stale; mean −0.148 (matches paper −0.149)
	- `mfe/{SFT_circ10k_subset, SFT_temp_sweep, RL_t1.15_8prompt, RL_temp_sweep_2prompt}/` — additional MFE coverage
	- `rejection_v3/{Base,SFT,GRPO}/` — 8-prompt × 1250 = 10K, analysis2 strict QC, sweep-optimal T
	- `rejection_topK/` — M=50 attempts × K∈{1,4,16,64} success rates
	- `plannotate/{RL, Base_t0.95, SFT_t0.95}/` — Table 8 sources
	- `novelty_blastn/summary.csv` — Table 2
	- `reference/addgene_500/`, `original_paper/`, `models/`, `code_snapshots/` — auxiliary

	## SFT-stale files NOT yet replaced

	- `rejection_sampling_v2/direct/SFT/` and `rejection_sampling_v2/best_of_16/SFT/` — the original Table 4 SFT cells. Kept in place for paper reproducibility but superseded by `rejection_v3/SFT/` if camera-ready uses the new 8-prompt protocol. Old numbers (7.15% / 32.4%) used pre-fix SFT checkpoint; new (10.87% / —) uses corrected checkpoint.
	- `evaluation/temperature_sweep/SFT_t0.95/` — generations with broken checkpoint; appendix material only

	The original `continuation_benchmark/{completion,surprisal}_benchmark.csv` (small 11-plasmid set) was kept as legacy data; numbers reproduce paper Table 5 (Base −12.449, RL −10.966) but the new `eval_set_656/` is much more rigorous.

	## Unaffected — known good throughout

	- `evaluation/eight_prompt/Base/`, `mfe/Base/`, `mfe/RL/` (= old `GRPO_temp1.0`), `mfe/ablations/*/`
	- `rejection_sampling_v2/direct/{Base,GRPO}/`, `rejection_sampling_v2/best_of_16/{Base,GRPO}/`
	- `evaluation/eight_prompt/ablations//` (ablation models from McClain/plasmidgpt-rl-)
	- `plannotate/RL/`, `novelty_blastn/`, `reference/`, `original_paper/`

Xet Storage Details

Size:: 2.78 kB
Xet hash:: b869aee02a057b47af98e4964c01c69b148ec945da708a832453fb7a48debb7b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.