Buckets:
Bucket data status — what's verified, what's pending
Issue (root-caused 2026-05-02): HF repo UCL-CSSB/PlasmidGPT-SFT had two safetensors files; AutoModelForCausalLM.from_pretrained defaulted to a Base-clone duplicate. Cleanup commit daeaabf swapped them. SHA changed from 5748e6f9 → daeaabf0. models/pinned_shas.csv is updated.
Verified clean (post-2026-05-04 audit)
evaluation/eight_prompt/{Base,SFT,RL}/— analysis2 strict QC; 4.275 / 10.975 / 71.575evaluation/eight_prompt/ablations/full_reward/— GRPO @ T=0.95 = 66.875%evaluation/eight_prompt/ablations/{cds_only,length_only,no_cassette_bonus,no_length_prior,no_repeat_penalty}/— Table 7 rows 2-6analysis/distribution_metrics.csv+analysis/distribution/per_seq_*.csv— Table 6 sourcecontinuation_benchmark/eval_set_656/— 656 plasmids × 5 splits (PRIMARY Table 5 source)continuation_benchmark/heldout_eng_r3/— PLSDB-style F1–F6 NCBI queriescontinuation_benchmark/{both_metric_eval, validation_eval, holdout30_non_addgene}/— additional held-out evalsmfe/SFT_real/— replaces stale; mean −0.148 (matches paper −0.149)mfe/{SFT_circ10k_subset, SFT_temp_sweep, RL_t1.15_8prompt, RL_temp_sweep_2prompt}/— additional MFE coveragerejection_v3/{Base,SFT,GRPO}/— 8-prompt × 1250 = 10K, analysis2 strict QC, sweep-optimal Trejection_topK/— M=50 attempts × K∈{1,4,16,64} success ratesplannotate/{RL, Base_t0.95, SFT_t0.95}/— Table 8 sourcesnovelty_blastn/summary.csv— Table 2reference/addgene_500/,original_paper/,models/,code_snapshots/— auxiliary
SFT-stale files NOT yet replaced
rejection_sampling_v2/direct/SFT/andrejection_sampling_v2/best_of_16/SFT/— the original Table 4 SFT cells. Kept in place for paper reproducibility but superseded byrejection_v3/SFT/if camera-ready uses the new 8-prompt protocol. Old numbers (7.15% / 32.4%) used pre-fix SFT checkpoint; new (10.87% / —) uses corrected checkpoint.evaluation/temperature_sweep/SFT_t0.95/— generations with broken checkpoint; appendix material only
The original continuation_benchmark/{completion,surprisal}_benchmark.csv (small 11-plasmid set) was kept as legacy data; numbers reproduce paper Table 5 (Base −12.449, RL −10.966) but the new eval_set_656/ is much more rigorous.
Unaffected — known good throughout
evaluation/eight_prompt/Base/,mfe/Base/,mfe/RL/(= oldGRPO_temp1.0),mfe/ablations/*/rejection_sampling_v2/direct/{Base,GRPO}/,rejection_sampling_v2/best_of_16/{Base,GRPO}/evaluation/eight_prompt/ablations/*/(ablation models from McClain/plasmidgpt-rl-*)plannotate/RL/,novelty_blastn/,reference/,original_paper/
Xet Storage Details
- Size:
- 2.78 kB
- Xet hash:
- b869aee02a057b47af98e4964c01c69b148ec945da708a832453fb7a48debb7b
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.