dnathinker-checkpoints / results /master_progress.md

Upload results/master_progress.md with huggingface_hub

bf10107 verified 12 days ago

11.6 kB

Master progress dashboard — 2026-04-27 ~04:40 UTC

Every experiment that's been run, is running, or is queued — across H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP / unified-MDLM / diffusion) and contrastive (aux pair, aligner loss) ablations.

1. Live processes (H100)

PID	Job	Elapsed	ETA
100474	`launch_bench_vllm.sh` orchestrator	since Apr 26	runs until last task completes
137805	T1 reasoning expansion (`build_reasoning_traces.py`)	53 min	~10 min remaining (281/333)
139902	T3 zs_raw vLLM bench	35 min	~4.5 h
100544	watcher → `post_bench_pipeline.sh`	idle	fires when bench grid exits

2. T1 — enhancer_generation

#	Variant	Host	n	parse	gc_err	len_ratio	Cells	Sample path
1	zs_raw (full)	H100	372,210	0.9996	0.116	1.64	7-cell	`runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/`
2	zs_enriched (full)	H100	372,210	0.9997	0.126	1.67	7-cell	`runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/`
3	zs_raw TRUNCATED (max=200, superseded)	H100	372,210	0.9996	0.124	0.72	7-cell	`runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/`
4	zs_raw smoke (n=64 Ex)	lab	64	1.0	0.093	1.83	Ex	`_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/`
5	zs_enriched smoke (n=64)	lab	64	1.0	0.096	1.62	Ex	`…/zs_enriched/`
6	lora_raw smoke [COLLAPSED]	lab	64	1.0	0.070	3.64 🚨	Ex	`…/lora_raw/`
7	lora_enriched smoke [COLLAPSED]	lab	64	1.0	0.102	3.90 🚨	Ex	`…/lora_enriched/`
8	fusion-SFT (Stage 1)	H100	—	—	—	—	—	QUEUED (auto post-bench)
9	NTv3-MDLM (Stage 5)	H100	—	—	—	—	—	QUEUED
10	Reasoning expansion (Tier 2)	H100	281 / 333	0 leaks	rich rationales	—	7-cell	`data/reasoning_traces/train.enhancer_generation.reasoning.jsonl`

3. T2 — pair_prediction

#	Variant	Host	n	accuracy	F1	precision	recall	Cells	Sample path
1	zs_raw (full)	H100	744,420	0.500	0.0001	0.65	~0	7-cell	`runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/`
2	zs_enriched (full)	H100	744,420	0.500	0.002	0.58	0.001	7-cell	`runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/`
3	asym pair NTv3+NT-v2 aux=none	lab	128	0.773	0.808	0.701	0.953	Ex	`_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/`
4	asym pair aux=supcon_pair	lab	128	0.719	0.710	0.733	0.688	Ex	`…/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/`
5	asym pair aux=tier_aware_supcon	lab	128	0.711	0.776	0.634	1.000	Ex	`…/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/`
6	fusion-SFT (Stage 2)	H100	—	—	—	—	—	—	QUEUED
7	Galaxy regen (enhancer TFBS scan)	lab/galaxy	—	—	—	—	—	—	PROVISIONED (lab patched script in `dec7a3e`); not yet launched
8	Reasoning expansion (Tier 2)	H100	—	gated on #7	—	—	—	—	DEFERRED (Stage 3e)
9	NTv3-direct (Stage 6)	H100	—	—	—	—	—	—	QUEUED

⚠️ T2 zero-shot is degenerate: model trivially predicts not_paired → recall ≈ 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 — proves the architecture works, full benchmark pending.

4. T3 — enhancer_editing

#	Variant	Host	Status	Sample path
1	zs_raw bench (full ~372k)	H100	RUNNING (PID 139902, 35 min in, ETA ~4.5 h)	`runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/`
2	zs_enriched bench (full ~372k)	H100	queued behind #1	same parent dir / `zs_enriched/`
3	fusion-SFT (Stage 3, heuristic gold)	H100	queued (auto, post-bench)	`runs/exp_t3_fusion_sft_20260427_h100/`
4	reasoning-only ablation (Stage 3b)	H100	queued	`runs/exp_t3_fusion_sft_reasonly_20260427_h100/`
5	multi-turn RFT (Stage 3c, --rounds 4)	H100	queued	`runs/exp_t3_fusion_sft_rft_20260427_h100/`
6	post-RFT reasoning expansion (Stage 3d)	H100	queued (gated on #5)	`data/reasoning_traces/train.enhancer_editing.reasoning.jsonl`
7	RFT-from-joint ablation	lab	proposed in `t3_post_v5_followups.md` §1	—
8	Loop-SFT on post-RFT	lab	proposed	—

5. Joint multitask (the headline)

#	Variant	Host	n	Status	Path
1	Joint multitask balanced 35k×3 (Stage 4)	H100	105k train	QUEUED	input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3)
2	Score adapter on T1 raw / enriched	H100	372k	queued	`predict_t1_{raw,enriched}/genqual.json`
3	Score adapter on T2 raw / enriched	H100	744k	queued	`predict_t2_{raw,enriched}/metrics.json`
4	Score adapter on T3 raw / enriched	H100	372k	queued	`predict_t3_{raw,enriched}/genqual_t3_oracle.json`

6. Architecture-mode ablation (Table 3 Phase 2 — `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)

Status: DEFERRED to lab cluster.

The DNA-output-head ablation surface is wired (scripts/train_fusion_sft.py --architecture-mode {llava,unified,diffusion}, --dna-loss-kind {mdlm,ntp}, --dna-loss-weight λ) — see docs/unified_multimodal_lm_survey.md for the survey behind it. Currently:

Mode	Status	Where wired
`llava` (default; LLM head emits DNA as text tokens)	In every fusion-SFT call on H100	`slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava`
`unified+ntp` (DNA head with plain CE on the DNA vocab)	Wired but not launched	`slurm/run_unified_arch_ablation.sh`
`unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight)	wired, not launched	same launcher
`diffusion` (LLaDA full diffusion)	NOT YET WIRED (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)")	future

Lab action item added: launch slurm/run_unified_arch_ablation.sh on a non-H100 node (the H100 stays focused on the headline runs). Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm on T1. ETA ~10h per arch on a lab GPU. Already documented in docs/minimal_publishable_suite.md §4e.

7. Contrastive / aux-loss ablations

7a. T2 pair-aux contrastive (DONE — lab smoke)

3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at n=128 Ex (rows 3–5 in the T2 table above). The full-set re-run will fire after the galaxy regen lands so the new T2 enriched JSONL feeds both the asym-pair model and the fusion-SFT stack.

7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)

slurm/run_aligner_loss_ablation.sh — three loss variants (infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner (promoter↔enhancer↔expression). Status: wired, not launched. Lab side. Documented in docs/minimal_publishable_suite.md §4b.

7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)

slurm/run_multi_encoder_grid.sh — DNA-encoder ablation at the T1 / T2 layer. Wired; NTv3-650M is the current default everywhere. Lab side, not launched.

8. Oracle + supporting infra

Asset	Host	Status	Path
DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable)	lab	DONE	`_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt`
Enformer oracle (Table 4 cross-oracle)	lab	not built	—
Sei oracle (Table 4)	lab	not built	—
Joint multitask balanced 105k JSONL	H100	DONE (35k × 3 verified)	`data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl`
Test JSONLs (T1+T2+T3 full)	H100	DONE	`data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl`

9. External baselines (the comparison gap)

Model	Status	Priority	Doc
TACO (Lin et al. NeurIPS 2024) — T3 paper precedent	NOT STARTED	HIGH	`t3_post_v5_followups.md §5`
HyenaDNA — T2 fluency baseline	NOT STARTED	HIGH	same
DNABERT-2 / NT-v2 — encoder baselines	wired as encoders only; head not trained	MEDIUM	same
CtrlDNA — T1 conditional gen	NOT STARTED	MEDIUM	same
Evo / Evo2 — large fluency	NOT STARTED	LOW	same

Lab action item: TACO + HyenaDNA, ~1 day each.

10. SV-GSPO (RL) + Loop-SFT — pipeline state

Component	Status
SV-GSPO outcome reward for T3 (was buggy, fixed in `e133cf1`)	code synced, not yet trained
SV-GSPO ablation grid (Table 2: cost-aware / k₃-KL / DAPO / KL=0 / no-group-norm)	pipeline wired, not yet launched
Loop-SFT on heuristic-gold trajectories	pipeline wired, not launched
Loop-SFT on post-RFT trajectories (T3 only)	proposed in `t3_post_v5_followups.md §3`

11. Branch + HF state

HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
                                4 commits ahead of mllm-integrate
                                0 commits behind  (lab fully caught up)

HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
  runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
  runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
  data/reasoning_traces/train.enhancer_generation.reasoning.jsonl  (live, 281 rows)
  data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
  data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
  docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
  results/h100_snapshot.md

12. Total ETA to headline

Step	Wall-clock
T3 zs_raw + enriched bench	~10 h from now
Stage 0c oracle scoring on T3 zs preds	~30 min after bench
Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle)	~22 h
Stage 3b (T3 reasoning-only)	~3 h
Stage 3c (T3 multi-turn RFT + retrain)	~5 h
Stage 3d (T3 reasoning expansion 333 rows)	~30 min IO-bound (in parallel with Stage 4)
Stage 3f (T1 reasoning)	continuous, +333/day
Stage 4 (joint multitask 105k)	~10 h
Stages 5+6 (NTv3-only baselines)	~4 h
Stage 7 (aggregator + final HF push)	minutes

Total H100 post-bench: ~36 h. With lab cluster handling arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel, the headline submission lands in ~3–4 days.

13. Critical gates (what's blocking)

Gate	Blocker	Unblocks
Galaxy T2 enhancer regen	lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU	T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1
T3 RFT runs (Stage 3c)	needs Stage 3 done	T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column
Reasoning accumulation (Tier 2)	OpenRouter 1000/day cap per key	"Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side.
TACO + HyenaDNA	lab work	external baseline rows in Table 1 — reviewers will ask