Buckets:

UCL-CSSB/PlasmidRL-ICML / SFT_STALE.md
McClain's picture
|
download
raw
2.78 kB

Bucket data status — what's verified, what's pending

Issue (root-caused 2026-05-02): HF repo UCL-CSSB/PlasmidGPT-SFT had two safetensors files; AutoModelForCausalLM.from_pretrained defaulted to a Base-clone duplicate. Cleanup commit daeaabf swapped them. SHA changed from 5748e6f9daeaabf0. models/pinned_shas.csv is updated.

Verified clean (post-2026-05-04 audit)

  • evaluation/eight_prompt/{Base,SFT,RL}/ — analysis2 strict QC; 4.275 / 10.975 / 71.575
  • evaluation/eight_prompt/ablations/full_reward/ — GRPO @ T=0.95 = 66.875%
  • evaluation/eight_prompt/ablations/{cds_only,length_only,no_cassette_bonus,no_length_prior,no_repeat_penalty}/ — Table 7 rows 2-6
  • analysis/distribution_metrics.csv + analysis/distribution/per_seq_*.csv — Table 6 source
  • continuation_benchmark/eval_set_656/ — 656 plasmids × 5 splits (PRIMARY Table 5 source)
  • continuation_benchmark/heldout_eng_r3/ — PLSDB-style F1–F6 NCBI queries
  • continuation_benchmark/{both_metric_eval, validation_eval, holdout30_non_addgene}/ — additional held-out evals
  • mfe/SFT_real/ — replaces stale; mean −0.148 (matches paper −0.149)
  • mfe/{SFT_circ10k_subset, SFT_temp_sweep, RL_t1.15_8prompt, RL_temp_sweep_2prompt}/ — additional MFE coverage
  • rejection_v3/{Base,SFT,GRPO}/ — 8-prompt × 1250 = 10K, analysis2 strict QC, sweep-optimal T
  • rejection_topK/ — M=50 attempts × K∈{1,4,16,64} success rates
  • plannotate/{RL, Base_t0.95, SFT_t0.95}/ — Table 8 sources
  • novelty_blastn/summary.csv — Table 2
  • reference/addgene_500/, original_paper/, models/, code_snapshots/ — auxiliary

SFT-stale files NOT yet replaced

  • rejection_sampling_v2/direct/SFT/ and rejection_sampling_v2/best_of_16/SFT/ — the original Table 4 SFT cells. Kept in place for paper reproducibility but superseded by rejection_v3/SFT/ if camera-ready uses the new 8-prompt protocol. Old numbers (7.15% / 32.4%) used pre-fix SFT checkpoint; new (10.87% / —) uses corrected checkpoint.
  • evaluation/temperature_sweep/SFT_t0.95/ — generations with broken checkpoint; appendix material only

The original continuation_benchmark/{completion,surprisal}_benchmark.csv (small 11-plasmid set) was kept as legacy data; numbers reproduce paper Table 5 (Base −12.449, RL −10.966) but the new eval_set_656/ is much more rigorous.

Unaffected — known good throughout

  • evaluation/eight_prompt/Base/, mfe/Base/, mfe/RL/ (= old GRPO_temp1.0), mfe/ablations/*/
  • rejection_sampling_v2/direct/{Base,GRPO}/, rejection_sampling_v2/best_of_16/{Base,GRPO}/
  • evaluation/eight_prompt/ablations/*/ (ablation models from McClain/plasmidgpt-rl-*)
  • plannotate/RL/, novelty_blastn/, reference/, original_paper/

Xet Storage Details

Size:
2.78 kB
·
Xet hash:
b869aee02a057b47af98e4964c01c69b148ec945da708a832453fb7a48debb7b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.