dnathinker-checkpoints / results /master_progress.md

Upload results/master_progress.md with huggingface_hub

bf10107 verified 12 days ago

11.6 kB

	# Master progress dashboard — 2026-04-27 ~04:40 UTC

	Every experiment that's been run, is running, or is queued — across
	H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
	unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
	ablations.

	## 1. Live processes (H100)

	\| PID \| Job \| Elapsed \| ETA \|
	\|---\|---\|---\|---\|
	\| 100474 \| `launch_bench_vllm.sh` orchestrator \| since Apr 26 \| runs until last task completes \|
	\| 137805 \| T1 reasoning expansion (`build_reasoning_traces.py`) \| 53 min \| ~10 min remaining (281/333) \|
	\| 139902 \| T3 zs_raw vLLM bench \| 35 min \| ~4.5 h \|
	\| 100544 \| watcher → `post_bench_pipeline.sh` \| idle \| fires when bench grid exits \|

	## 2. T1 — enhancer_generation

	\| # \| Variant \| Host \| n \| parse \| gc_err \| len_ratio \| Cells \| Sample path \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw (full) \| H100 \| 372,210 \| 0.9996 \| 0.116 \| 1.64 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched (full) \| H100 \| 372,210 \| 0.9997 \| 0.126 \| 1.67 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` \|
	\| 3 \| zs_raw TRUNCATED (max=200, superseded) \| H100 \| 372,210 \| 0.9996 \| 0.124 \| 0.72 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/` \|
	\| 4 \| zs_raw smoke (n=64 Ex) \| lab \| 64 \| 1.0 \| 0.093 \| 1.83 \| Ex \| `_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/` \|
	\| 5 \| zs_enriched smoke (n=64) \| lab \| 64 \| 1.0 \| 0.096 \| 1.62 \| Ex \| `…/zs_enriched/` \|
	\| 6 \| lora_raw smoke [COLLAPSED] \| lab \| 64 \| 1.0 \| 0.070 \| 3.64 🚨 \| Ex \| `…/lora_raw/` \|
	\| 7 \| lora_enriched smoke [COLLAPSED] \| lab \| 64 \| 1.0 \| 0.102 \| 3.90 🚨 \| Ex \| `…/lora_enriched/` \|
	\| 8 \| fusion-SFT (Stage 1) \| H100 \| — \| — \| — \| — \| — \| QUEUED (auto post-bench) \|
	\| 9 \| NTv3-MDLM (Stage 5) \| H100 \| — \| — \| — \| — \| — \| QUEUED \|
	\| 10 \| Reasoning expansion (Tier 2) \| H100 \| 281 / 333 \| 0 leaks \| rich rationales \| — \| 7-cell \| `data/reasoning_traces/train.enhancer_generation.reasoning.jsonl` \|

	## 3. T2 — pair_prediction

	\| # \| Variant \| Host \| n \| accuracy \| F1 \| precision \| recall \| Cells \| Sample path \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw (full) \| H100 \| 744,420 \| 0.500 \| 0.0001 \| 0.65 \| ~0 \| 7-cell \| `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched (full) \| H100 \| 744,420 \| 0.500 \| 0.002 \| 0.58 \| 0.001 \| 7-cell \| `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` \|
	\| 3 \| asym pair NTv3+NT-v2 aux=none \| lab \| 128 \| 0.773 \| 0.808 \| 0.701 \| 0.953 \| Ex \| `_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/` \|
	\| 4 \| asym pair aux=supcon_pair \| lab \| 128 \| 0.719 \| 0.710 \| 0.733 \| 0.688 \| Ex \| `…/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/` \|
	\| 5 \| asym pair aux=tier_aware_supcon \| lab \| 128 \| 0.711 \| 0.776 \| 0.634 \| 1.000 \| Ex \| `…/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/` \|
	\| 6 \| fusion-SFT (Stage 2) \| H100 \| — \| — \| — \| — \| — \| — \| QUEUED \|
	\| 7 \| Galaxy regen (enhancer TFBS scan) \| lab/galaxy \| — \| — \| — \| — \| — \| — \| PROVISIONED (lab patched script in `dec7a3e`); not yet launched \|
	\| 8 \| Reasoning expansion (Tier 2) \| H100 \| — \| gated on #7 \| — \| — \| — \| — \| DEFERRED (Stage 3e) \|
	\| 9 \| NTv3-direct (Stage 6) \| H100 \| — \| — \| — \| — \| — \| — \| QUEUED \|

	⚠️ T2 zero-shot is degenerate: model trivially predicts `not_paired` → recall ≈ 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 — proves the architecture works, full benchmark pending.

	## 4. T3 — enhancer_editing

	\| # \| Variant \| Host \| Status \| Sample path \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw bench (full ~372k) \| H100 \| RUNNING (PID 139902, 35 min in, ETA ~4.5 h) \| `runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched bench (full ~372k) \| H100 \| queued behind #1 \| same parent dir / `zs_enriched/` \|
	\| 3 \| fusion-SFT (Stage 3, heuristic gold) \| H100 \| queued (auto, post-bench) \| `runs/exp_t3_fusion_sft_20260427_h100/` \|
	\| 4 \| reasoning-only ablation (Stage 3b) \| H100 \| queued \| `runs/exp_t3_fusion_sft_reasonly_20260427_h100/` \|
	\| 5 \| multi-turn RFT (Stage 3c, --rounds 4) \| H100 \| queued \| `runs/exp_t3_fusion_sft_rft_20260427_h100/` \|
	\| 6 \| post-RFT reasoning expansion (Stage 3d) \| H100 \| queued (gated on #5) \| `data/reasoning_traces/train.enhancer_editing.reasoning.jsonl` \|
	\| 7 \| RFT-from-joint ablation \| lab \| proposed in `t3_post_v5_followups.md` §1 \| — \|
	\| 8 \| Loop-SFT on post-RFT \| lab \| proposed \| — \|

	## 5. Joint multitask (the headline)

	\| # \| Variant \| Host \| n \| Status \| Path \|
	\|---\|---\|---\|---\|---\|---\|
	\| 1 \| Joint multitask balanced 35k×3 (Stage 4) \| H100 \| 105k train \| QUEUED \| input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3) \|
	\| 2 \| Score adapter on T1 raw / enriched \| H100 \| 372k \| queued \| `predict_t1_{raw,enriched}/genqual.json` \|
	\| 3 \| Score adapter on T2 raw / enriched \| H100 \| 744k \| queued \| `predict_t2_{raw,enriched}/metrics.json` \|
	\| 4 \| Score adapter on T3 raw / enriched \| H100 \| 372k \| queued \| `predict_t3_{raw,enriched}/genqual_t3_oracle.json` \|

	## 6. Architecture-mode ablation (Table 3 Phase 2 — `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)

	Status: DEFERRED to lab cluster.

	The DNA-output-head ablation surface is wired (`scripts/train_fusion_sft.py
	--architecture-mode {llava,unified,diffusion}`, `--dna-loss-kind {mdlm,ntp}`,
	`--dna-loss-weight λ`) — see `docs/unified_multimodal_lm_survey.md` for the
	survey behind it. Currently:

	\| Mode \| Status \| Where wired \|
	\|---\|---\|---\|
	\| `llava` (default; LLM head emits DNA as text tokens) \| In every fusion-SFT call on H100 \| `slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava` \|
	\| `unified+ntp` (DNA head with plain CE on the DNA vocab) \| Wired but not launched \| `slurm/run_unified_arch_ablation.sh` \|
	\| `unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight) \| wired, not launched \| same launcher \|
	\| `diffusion` (LLaDA full diffusion) \| NOT YET WIRED (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)") \| future \|

	Lab action item added: launch `slurm/run_unified_arch_ablation.sh`
	on a non-H100 node (the H100 stays focused on the headline runs).
	Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
	on T1. ETA ~10h per arch on a lab GPU. Already documented in
	`docs/minimal_publishable_suite.md §4e`.

	## 7. Contrastive / aux-loss ablations

	### 7a. T2 pair-aux contrastive (DONE — lab smoke)

	3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
	n=128 Ex (rows 3–5 in the T2 table above). The full-set re-run will
	fire after the galaxy regen lands so the new T2 enriched JSONL feeds
	both the asym-pair model and the fusion-SFT stack.

	### 7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)

	`slurm/run_aligner_loss_ablation.sh` — three loss variants
	(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
	(promoter↔enhancer↔expression). Status: wired, not launched. Lab
	side. Documented in `docs/minimal_publishable_suite.md §4b`.

	### 7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)

	`slurm/run_multi_encoder_grid.sh` — DNA-encoder ablation at the T1 /
	T2 layer. Wired; NTv3-650M is the current default everywhere.
	Lab side, not launched.

	## 8. Oracle + supporting infra

	\| Asset \| Host \| Status \| Path \|
	\|---\|---\|---\|---\|
	\| DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable) \| lab \| DONE \| `_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt` \|
	\| Enformer oracle (Table 4 cross-oracle) \| lab \| not built \| — \|
	\| Sei oracle (Table 4) \| lab \| not built \| — \|
	\| Joint multitask balanced 105k JSONL \| H100 \| DONE (35k × 3 verified) \| `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` \|
	\| Test JSONLs (T1+T2+T3 full) \| H100 \| DONE \| `data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl` \|

	## 9. External baselines (the comparison gap)

	\| Model \| Status \| Priority \| Doc \|
	\|---\|---\|---\|---\|
	\| TACO (Lin et al. NeurIPS 2024) — T3 paper precedent \| NOT STARTED \| HIGH \| `t3_post_v5_followups.md §5` \|
	\| HyenaDNA — T2 fluency baseline \| NOT STARTED \| HIGH \| same \|
	\| DNABERT-2 / NT-v2 — encoder baselines \| wired as encoders only; head not trained \| MEDIUM \| same \|
	\| CtrlDNA — T1 conditional gen \| NOT STARTED \| MEDIUM \| same \|
	\| Evo / Evo2 — large fluency \| NOT STARTED \| LOW \| same \|

	Lab action item: TACO + HyenaDNA, ~1 day each.

	## 10. SV-GSPO (RL) + Loop-SFT — pipeline state

	\| Component \| Status \|
	\|---\|---\|
	\| SV-GSPO outcome reward for T3 (was buggy, fixed in `e133cf1`) \| code synced, not yet trained \|
	\| SV-GSPO ablation grid (Table 2: cost-aware / k₃-KL / DAPO / KL=0 / no-group-norm) \| pipeline wired, not yet launched \|
	\| Loop-SFT on heuristic-gold trajectories \| pipeline wired, not launched \|
	\| Loop-SFT on post-RFT trajectories (T3 only) \| proposed in `t3_post_v5_followups.md §3` \|

	## 11. Branch + HF state

	```
	HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
	4 commits ahead of mllm-integrate
	0 commits behind (lab fully caught up)

	HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
	runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
	runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
	data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (live, 281 rows)
	data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
	data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
	docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
	results/h100_snapshot.md
	```

	## 12. Total ETA to headline

	\| Step \| Wall-clock \|
	\|---\|---\|
	\| T3 zs_raw + enriched bench \| ~10 h from now \|
	\| Stage 0c oracle scoring on T3 zs preds \| ~30 min after bench \|
	\| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) \| ~22 h \|
	\| Stage 3b (T3 reasoning-only) \| ~3 h \|
	\| Stage 3c (T3 multi-turn RFT + retrain) \| ~5 h \|
	\| Stage 3d (T3 reasoning expansion 333 rows) \| ~30 min IO-bound (in parallel with Stage 4) \|
	\| Stage 3f (T1 reasoning) \| continuous, +333/day \|
	\| Stage 4 (joint multitask 105k) \| ~10 h \|
	\| Stages 5+6 (NTv3-only baselines) \| ~4 h \|
	\| Stage 7 (aggregator + final HF push) \| minutes \|

	Total H100 post-bench: ~36 h. With lab cluster handling
	arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
	the headline submission lands in ~3–4 days.

	## 13. Critical gates (what's blocking)

	\| Gate \| Blocker \| Unblocks \|
	\|---\|---\|---\|
	\| Galaxy T2 enhancer regen \| lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU \| T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 \|
	\| T3 RFT runs (Stage 3c) \| needs Stage 3 done \| T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column \|
	\| Reasoning accumulation (Tier 2) \| OpenRouter 1000/day cap per key \| "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. \|
	\| TACO + HyenaDNA \| lab work \| external baseline rows in Table 1 — reviewers will ask \|

	# Master progress dashboard — 2026-04-27 ~04:40 UTC

	Every experiment that's been run, is running, or is queued — across
	H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
	unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
	ablations.

	## 1. Live processes (H100)

	\| PID \| Job \| Elapsed \| ETA \|
	\|---\|---\|---\|---\|
	\| 100474 \| `launch_bench_vllm.sh` orchestrator \| since Apr 26 \| runs until last task completes \|
	\| 137805 \| T1 reasoning expansion (`build_reasoning_traces.py`) \| 53 min \| ~10 min remaining (281/333) \|
	\| 139902 \| T3 zs_raw vLLM bench \| 35 min \| ~4.5 h \|
	\| 100544 \| watcher → `post_bench_pipeline.sh` \| idle \| fires when bench grid exits \|

	## 2. T1 — enhancer_generation

	\| # \| Variant \| Host \| n \| parse \| gc_err \| len_ratio \| Cells \| Sample path \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw (full) \| H100 \| 372,210 \| 0.9996 \| 0.116 \| 1.64 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched (full) \| H100 \| 372,210 \| 0.9997 \| 0.126 \| 1.67 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` \|
	\| 3 \| zs_raw TRUNCATED (max=200, superseded) \| H100 \| 372,210 \| 0.9996 \| 0.124 \| 0.72 \| 7-cell \| `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/` \|
	\| 4 \| zs_raw smoke (n=64 Ex) \| lab \| 64 \| 1.0 \| 0.093 \| 1.83 \| Ex \| `_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/` \|
	\| 5 \| zs_enriched smoke (n=64) \| lab \| 64 \| 1.0 \| 0.096 \| 1.62 \| Ex \| `…/zs_enriched/` \|
	\| 6 \| lora_raw smoke [COLLAPSED] \| lab \| 64 \| 1.0 \| 0.070 \| 3.64 🚨 \| Ex \| `…/lora_raw/` \|
	\| 7 \| lora_enriched smoke [COLLAPSED] \| lab \| 64 \| 1.0 \| 0.102 \| 3.90 🚨 \| Ex \| `…/lora_enriched/` \|
	\| 8 \| fusion-SFT (Stage 1) \| H100 \| — \| — \| — \| — \| — \| QUEUED (auto post-bench) \|
	\| 9 \| NTv3-MDLM (Stage 5) \| H100 \| — \| — \| — \| — \| — \| QUEUED \|
	\| 10 \| Reasoning expansion (Tier 2) \| H100 \| 281 / 333 \| 0 leaks \| rich rationales \| — \| 7-cell \| `data/reasoning_traces/train.enhancer_generation.reasoning.jsonl` \|

	## 3. T2 — pair_prediction

	\| # \| Variant \| Host \| n \| accuracy \| F1 \| precision \| recall \| Cells \| Sample path \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw (full) \| H100 \| 744,420 \| 0.500 \| 0.0001 \| 0.65 \| ~0 \| 7-cell \| `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched (full) \| H100 \| 744,420 \| 0.500 \| 0.002 \| 0.58 \| 0.001 \| 7-cell \| `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` \|
	\| 3 \| asym pair NTv3+NT-v2 aux=none \| lab \| 128 \| 0.773 \| 0.808 \| 0.701 \| 0.953 \| Ex \| `_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/` \|
	\| 4 \| asym pair aux=supcon_pair \| lab \| 128 \| 0.719 \| 0.710 \| 0.733 \| 0.688 \| Ex \| `…/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/` \|
	\| 5 \| asym pair aux=tier_aware_supcon \| lab \| 128 \| 0.711 \| 0.776 \| 0.634 \| 1.000 \| Ex \| `…/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/` \|
	\| 6 \| fusion-SFT (Stage 2) \| H100 \| — \| — \| — \| — \| — \| — \| QUEUED \|
	\| 7 \| Galaxy regen (enhancer TFBS scan) \| lab/galaxy \| — \| — \| — \| — \| — \| — \| PROVISIONED (lab patched script in `dec7a3e`); not yet launched \|
	\| 8 \| Reasoning expansion (Tier 2) \| H100 \| — \| gated on #7 \| — \| — \| — \| — \| DEFERRED (Stage 3e) \|
	\| 9 \| NTv3-direct (Stage 6) \| H100 \| — \| — \| — \| — \| — \| — \| QUEUED \|

	⚠️ T2 zero-shot is degenerate: model trivially predicts `not_paired` → recall ≈ 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 — proves the architecture works, full benchmark pending.

	## 4. T3 — enhancer_editing

	\| # \| Variant \| Host \| Status \| Sample path \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| zs_raw bench (full ~372k) \| H100 \| RUNNING (PID 139902, 35 min in, ETA ~4.5 h) \| `runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` \|
	\| 2 \| zs_enriched bench (full ~372k) \| H100 \| queued behind #1 \| same parent dir / `zs_enriched/` \|
	\| 3 \| fusion-SFT (Stage 3, heuristic gold) \| H100 \| queued (auto, post-bench) \| `runs/exp_t3_fusion_sft_20260427_h100/` \|
	\| 4 \| reasoning-only ablation (Stage 3b) \| H100 \| queued \| `runs/exp_t3_fusion_sft_reasonly_20260427_h100/` \|
	\| 5 \| multi-turn RFT (Stage 3c, --rounds 4) \| H100 \| queued \| `runs/exp_t3_fusion_sft_rft_20260427_h100/` \|
	\| 6 \| post-RFT reasoning expansion (Stage 3d) \| H100 \| queued (gated on #5) \| `data/reasoning_traces/train.enhancer_editing.reasoning.jsonl` \|
	\| 7 \| RFT-from-joint ablation \| lab \| proposed in `t3_post_v5_followups.md` §1 \| — \|
	\| 8 \| Loop-SFT on post-RFT \| lab \| proposed \| — \|

	## 5. Joint multitask (the headline)

	\| # \| Variant \| Host \| n \| Status \| Path \|
	\|---\|---\|---\|---\|---\|---\|
	\| 1 \| Joint multitask balanced 35k×3 (Stage 4) \| H100 \| 105k train \| QUEUED \| input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3) \|
	\| 2 \| Score adapter on T1 raw / enriched \| H100 \| 372k \| queued \| `predict_t1_{raw,enriched}/genqual.json` \|
	\| 3 \| Score adapter on T2 raw / enriched \| H100 \| 744k \| queued \| `predict_t2_{raw,enriched}/metrics.json` \|
	\| 4 \| Score adapter on T3 raw / enriched \| H100 \| 372k \| queued \| `predict_t3_{raw,enriched}/genqual_t3_oracle.json` \|

	## 6. Architecture-mode ablation (Table 3 Phase 2 — `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)

	Status: DEFERRED to lab cluster.

	The DNA-output-head ablation surface is wired (`scripts/train_fusion_sft.py
	--architecture-mode {llava,unified,diffusion}`, `--dna-loss-kind {mdlm,ntp}`,
	`--dna-loss-weight λ`) — see `docs/unified_multimodal_lm_survey.md` for the
	survey behind it. Currently:

	\| Mode \| Status \| Where wired \|
	\|---\|---\|---\|
	\| `llava` (default; LLM head emits DNA as text tokens) \| In every fusion-SFT call on H100 \| `slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava` \|
	\| `unified+ntp` (DNA head with plain CE on the DNA vocab) \| Wired but not launched \| `slurm/run_unified_arch_ablation.sh` \|
	\| `unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight) \| wired, not launched \| same launcher \|
	\| `diffusion` (LLaDA full diffusion) \| NOT YET WIRED (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)") \| future \|

	Lab action item added: launch `slurm/run_unified_arch_ablation.sh`
	on a non-H100 node (the H100 stays focused on the headline runs).
	Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
	on T1. ETA ~10h per arch on a lab GPU. Already documented in
	`docs/minimal_publishable_suite.md §4e`.

	## 7. Contrastive / aux-loss ablations

	### 7a. T2 pair-aux contrastive (DONE — lab smoke)

	3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
	n=128 Ex (rows 3–5 in the T2 table above). The full-set re-run will
	fire after the galaxy regen lands so the new T2 enriched JSONL feeds
	both the asym-pair model and the fusion-SFT stack.

	### 7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)

	`slurm/run_aligner_loss_ablation.sh` — three loss variants
	(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
	(promoter↔enhancer↔expression). Status: wired, not launched. Lab
	side. Documented in `docs/minimal_publishable_suite.md §4b`.

	### 7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)

	`slurm/run_multi_encoder_grid.sh` — DNA-encoder ablation at the T1 /
	T2 layer. Wired; NTv3-650M is the current default everywhere.
	Lab side, not launched.

	## 8. Oracle + supporting infra

	\| Asset \| Host \| Status \| Path \|
	\|---\|---\|---\|---\|
	\| DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable) \| lab \| DONE \| `_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt` \|
	\| Enformer oracle (Table 4 cross-oracle) \| lab \| not built \| — \|
	\| Sei oracle (Table 4) \| lab \| not built \| — \|
	\| Joint multitask balanced 105k JSONL \| H100 \| DONE (35k × 3 verified) \| `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` \|
	\| Test JSONLs (T1+T2+T3 full) \| H100 \| DONE \| `data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl` \|

	## 9. External baselines (the comparison gap)

	\| Model \| Status \| Priority \| Doc \|
	\|---\|---\|---\|---\|
	\| TACO (Lin et al. NeurIPS 2024) — T3 paper precedent \| NOT STARTED \| HIGH \| `t3_post_v5_followups.md §5` \|
	\| HyenaDNA — T2 fluency baseline \| NOT STARTED \| HIGH \| same \|
	\| DNABERT-2 / NT-v2 — encoder baselines \| wired as encoders only; head not trained \| MEDIUM \| same \|
	\| CtrlDNA — T1 conditional gen \| NOT STARTED \| MEDIUM \| same \|
	\| Evo / Evo2 — large fluency \| NOT STARTED \| LOW \| same \|

	Lab action item: TACO + HyenaDNA, ~1 day each.

	## 10. SV-GSPO (RL) + Loop-SFT — pipeline state

	\| Component \| Status \|
	\|---\|---\|
	\| SV-GSPO outcome reward for T3 (was buggy, fixed in `e133cf1`) \| code synced, not yet trained \|
	\| SV-GSPO ablation grid (Table 2: cost-aware / k₃-KL / DAPO / KL=0 / no-group-norm) \| pipeline wired, not yet launched \|
	\| Loop-SFT on heuristic-gold trajectories \| pipeline wired, not launched \|
	\| Loop-SFT on post-RFT trajectories (T3 only) \| proposed in `t3_post_v5_followups.md §3` \|

	## 11. Branch + HF state

	```
	HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
	4 commits ahead of mllm-integrate
	0 commits behind (lab fully caught up)

	HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
	runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
	runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
	data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (live, 281 rows)
	data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
	data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
	docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
	results/h100_snapshot.md
	```

	## 12. Total ETA to headline

	\| Step \| Wall-clock \|
	\|---\|---\|
	\| T3 zs_raw + enriched bench \| ~10 h from now \|
	\| Stage 0c oracle scoring on T3 zs preds \| ~30 min after bench \|
	\| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) \| ~22 h \|
	\| Stage 3b (T3 reasoning-only) \| ~3 h \|
	\| Stage 3c (T3 multi-turn RFT + retrain) \| ~5 h \|
	\| Stage 3d (T3 reasoning expansion 333 rows) \| ~30 min IO-bound (in parallel with Stage 4) \|
	\| Stage 3f (T1 reasoning) \| continuous, +333/day \|
	\| Stage 4 (joint multitask 105k) \| ~10 h \|
	\| Stages 5+6 (NTv3-only baselines) \| ~4 h \|
	\| Stage 7 (aggregator + final HF push) \| minutes \|

	Total H100 post-bench: ~36 h. With lab cluster handling
	arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
	the headline submission lands in ~3–4 days.

	## 13. Critical gates (what's blocking)

	\| Gate \| Blocker \| Unblocks \|
	\|---\|---\|---\|
	\| Galaxy T2 enhancer regen \| lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU \| T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 \|
	\| T3 RFT runs (Stage 3c) \| needs Stage 3 done \| T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column \|
	\| Reasoning accumulation (Tier 2) \| OpenRouter 1000/day cap per key \| "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. \|
	\| TACO + HyenaDNA \| lab work \| external baseline rows in Table 1 — reviewers will ask \|