File size: 11,646 Bytes
bf10107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# Master progress dashboard β€” 2026-04-27 ~04:40 UTC

Every experiment that's been run, is running, or is queued β€” across
H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
ablations.

## 1. Live processes (H100)

| PID | Job | Elapsed | ETA |
|---|---|---|---|
| 100474 | `launch_bench_vllm.sh` orchestrator | since Apr 26 | runs until last task completes |
| 137805 | T1 reasoning expansion (`build_reasoning_traces.py`) | 53 min | ~10 min remaining (281/333) |
| 139902 | T3 zs_raw vLLM bench | 35 min | ~4.5 h |
| 100544 | watcher β†’ `post_bench_pipeline.sh` | idle | fires when bench grid exits |

## 2. T1 β€” enhancer_generation

| # | Variant | Host | n | parse | gc_err | len_ratio | Cells | Sample path |
|---|---|---|---|---|---|---|---|---|
| 1 | **zs_raw (full)** | H100 | 372,210 | 0.9996 | 0.116 | **1.64** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | **zs_enriched (full)** | H100 | 372,210 | 0.9997 | 0.126 | **1.67** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
| 3 | zs_raw TRUNCATED (max=200, **superseded**) | H100 | 372,210 | 0.9996 | 0.124 | 0.72 | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/` |
| 4 | zs_raw smoke (n=64 Ex) | lab | 64 | 1.0 | 0.093 | 1.83 | Ex | `_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/` |
| 5 | zs_enriched smoke (n=64) | lab | 64 | 1.0 | 0.096 | 1.62 | Ex | `…/zs_enriched/` |
| 6 | **lora_raw smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.070 | **3.64** 🚨 | Ex | `…/lora_raw/` |
| 7 | **lora_enriched smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.102 | **3.90** 🚨 | Ex | `…/lora_enriched/` |
| 8 | **fusion-SFT (Stage 1)** | H100 | β€” | β€” | β€” | β€” | β€” | **QUEUED** (auto post-bench) |
| 9 | **NTv3-MDLM (Stage 5)** | H100 | β€” | β€” | β€” | β€” | β€” | **QUEUED** |
| 10 | **Reasoning expansion (Tier 2)** | H100 | 281 / 333 | 0 leaks | rich rationales | β€” | 7-cell | `data/reasoning_traces/train.enhancer_generation.reasoning.jsonl` |

## 3. T2 β€” pair_prediction

| # | Variant | Host | n | accuracy | F1 | precision | recall | Cells | Sample path |
|---|---|---|---|---|---|---|---|---|---|
| 1 | **zs_raw (full)** | H100 | 744,420 | 0.500 | 0.0001 | 0.65 | ~0 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | **zs_enriched (full)** | H100 | 744,420 | 0.500 | 0.002 | 0.58 | 0.001 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
| 3 | asym pair NTv3+NT-v2 aux=**none** | lab | 128 | 0.773 | **0.808** | 0.701 | 0.953 | Ex | `_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/` |
| 4 | asym pair aux=**supcon_pair** | lab | 128 | 0.719 | 0.710 | 0.733 | 0.688 | Ex | `…/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/` |
| 5 | asym pair aux=**tier_aware_supcon** | lab | 128 | 0.711 | 0.776 | 0.634 | **1.000** | Ex | `…/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/` |
| 6 | **fusion-SFT (Stage 2)** | H100 | β€” | β€” | β€” | β€” | β€” | β€” | **QUEUED** |
| 7 | **Galaxy regen (enhancer TFBS scan)** | lab/galaxy | β€” | β€” | β€” | β€” | β€” | β€” | **PROVISIONED** (lab patched script in `dec7a3e`); not yet launched |
| 8 | **Reasoning expansion (Tier 2)** | H100 | β€” | gated on #7 | β€” | β€” | β€” | β€” | **DEFERRED** (Stage 3e) |
| 9 | **NTv3-direct (Stage 6)** | H100 | β€” | β€” | β€” | β€” | β€” | β€” | **QUEUED** |

⚠️ **T2 zero-shot is degenerate**: model trivially predicts `not_paired` β†’ recall β‰ˆ 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 β€” proves the architecture works, full benchmark pending.

## 4. T3 β€” enhancer_editing

| # | Variant | Host | Status | Sample path |
|---|---|---|---|---|
| 1 | **zs_raw bench (full ~372k)** | H100 | **RUNNING** (PID 139902, 35 min in, ETA ~4.5 h) | `runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | zs_enriched bench (full ~372k) | H100 | queued behind #1 | same parent dir / `zs_enriched/` |
| 3 | fusion-SFT (Stage 3, heuristic gold) | H100 | queued (auto, post-bench) | `runs/exp_t3_fusion_sft_20260427_h100/` |
| 4 | **reasoning-only ablation (Stage 3b)** | H100 | queued | `runs/exp_t3_fusion_sft_reasonly_20260427_h100/` |
| 5 | **multi-turn RFT (Stage 3c, --rounds 4)** | H100 | queued | `runs/exp_t3_fusion_sft_rft_20260427_h100/` |
| 6 | **post-RFT reasoning expansion (Stage 3d)** | H100 | queued (gated on #5) | `data/reasoning_traces/train.enhancer_editing.reasoning.jsonl` |
| 7 | RFT-from-joint ablation | lab | proposed in `t3_post_v5_followups.md` Β§1 | β€” |
| 8 | Loop-SFT on post-RFT | lab | proposed | β€” |

## 5. Joint multitask (the headline)

| # | Variant | Host | n | Status | Path |
|---|---|---|---|---|---|
| 1 | **Joint multitask balanced 35kΓ—3** (Stage 4) | H100 | 105k train | **QUEUED** | input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3) |
| 2 | Score adapter on T1 raw / enriched | H100 | 372k | queued | `predict_t1_{raw,enriched}/genqual.json` |
| 3 | Score adapter on T2 raw / enriched | H100 | 744k | queued | `predict_t2_{raw,enriched}/metrics.json` |
| 4 | Score adapter on T3 raw / enriched | H100 | 372k | queued | `predict_t3_{raw,enriched}/genqual_t3_oracle.json` |

## 6. Architecture-mode ablation (Table 3 Phase 2 β€” `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)

**Status: DEFERRED to lab cluster.**

The DNA-output-head ablation surface is wired (`scripts/train_fusion_sft.py
--architecture-mode {llava,unified,diffusion}`, `--dna-loss-kind {mdlm,ntp}`,
`--dna-loss-weight Ξ»`) β€” see `docs/unified_multimodal_lm_survey.md` for the
survey behind it. Currently:

| Mode | Status | Where wired |
|---|---|---|
| `llava` (default; LLM head emits DNA as text tokens) | **In every fusion-SFT call on H100** | `slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava` |
| `unified+ntp` (DNA head with plain CE on the DNA vocab) | **Wired but not launched** | `slurm/run_unified_arch_ablation.sh` |
| `unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight) | wired, not launched | same launcher |
| `diffusion` (LLaDA full diffusion) | **NOT YET WIRED** (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)") | future |

**Lab action item** added: launch `slurm/run_unified_arch_ablation.sh`
on a non-H100 node (the H100 stays focused on the headline runs).
Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
on T1. ETA ~10h per arch on a lab GPU. Already documented in
`docs/minimal_publishable_suite.md Β§4e`.

## 7. Contrastive / aux-loss ablations

### 7a. T2 pair-aux contrastive (DONE β€” lab smoke)

3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
n=128 Ex (rows 3–5 in the T2 table above). The full-set re-run will
fire after the galaxy regen lands so the new T2 enriched JSONL feeds
both the asym-pair model and the fusion-SFT stack.

### 7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)

`slurm/run_aligner_loss_ablation.sh` β€” three loss variants
(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
(promoter↔enhancer↔expression). Status: **wired, not launched**. Lab
side. Documented in `docs/minimal_publishable_suite.md Β§4b`.

### 7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)

`slurm/run_multi_encoder_grid.sh` β€” DNA-encoder ablation at the T1 /
T2 layer. **Wired**; NTv3-650M is the current default everywhere.
Lab side, not launched.

## 8. Oracle + supporting infra

| Asset | Host | Status | Path |
|---|---|---|---|
| DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable) | lab | DONE | `_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt` |
| Enformer oracle (Table 4 cross-oracle) | lab | not built | β€” |
| Sei oracle (Table 4) | lab | not built | β€” |
| Joint multitask balanced 105k JSONL | H100 | DONE (35k Γ— 3 verified) | `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` |
| Test JSONLs (T1+T2+T3 full) | H100 | DONE | `data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl` |

## 9. External baselines (the comparison gap)

| Model | Status | Priority | Doc |
|---|---|---|---|
| TACO (Lin et al. NeurIPS 2024) β€” T3 paper precedent | **NOT STARTED** | **HIGH** | `t3_post_v5_followups.md Β§5` |
| HyenaDNA β€” T2 fluency baseline | NOT STARTED | **HIGH** | same |
| DNABERT-2 / NT-v2 β€” encoder baselines | wired as encoders only; head not trained | MEDIUM | same |
| CtrlDNA β€” T1 conditional gen | NOT STARTED | MEDIUM | same |
| Evo / Evo2 β€” large fluency | NOT STARTED | LOW | same |

Lab action item: TACO + HyenaDNA, ~1 day each.

## 10. SV-GSPO (RL) + Loop-SFT β€” pipeline state

| Component | Status |
|---|---|
| SV-GSPO outcome reward for T3 (was buggy, **fixed in `e133cf1`**) | code synced, not yet trained |
| SV-GSPO ablation grid (Table 2: cost-aware / k₃-KL / DAPO / KL=0 / no-group-norm) | pipeline wired, **not yet launched** |
| Loop-SFT on heuristic-gold trajectories | pipeline wired, not launched |
| Loop-SFT on post-RFT trajectories (T3 only) | proposed in `t3_post_v5_followups.md Β§3` |

## 11. Branch + HF state

```
HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
                                4 commits ahead of mllm-integrate
                                0 commits behind  (lab fully caught up)

HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
  runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
  runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
  data/reasoning_traces/train.enhancer_generation.reasoning.jsonl  (live, 281 rows)
  data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
  data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
  docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
  results/h100_snapshot.md
```

## 12. Total ETA to headline

| Step | Wall-clock |
|---|---|
| T3 zs_raw + enriched bench | ~10 h from now |
| Stage 0c oracle scoring on T3 zs preds | ~30 min after bench |
| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) | ~22 h |
| Stage 3b (T3 reasoning-only) | ~3 h |
| Stage 3c (T3 multi-turn RFT + retrain) | ~5 h |
| Stage 3d (T3 reasoning expansion 333 rows) | ~30 min IO-bound (in parallel with Stage 4) |
| Stage 3f (T1 reasoning) | continuous, +333/day |
| Stage 4 (joint multitask 105k) | ~10 h |
| Stages 5+6 (NTv3-only baselines) | ~4 h |
| Stage 7 (aggregator + final HF push) | minutes |

**Total H100 post-bench: ~36 h**. With lab cluster handling
arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
the headline submission lands in **~3–4 days**.

## 13. Critical gates (what's blocking)

| Gate | Blocker | Unblocks |
|---|---|---|
| **Galaxy T2 enhancer regen** | lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU | T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 |
| **T3 RFT runs (Stage 3c)** | needs Stage 3 done | T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column |
| **Reasoning accumulation (Tier 2)** | OpenRouter 1000/day cap per key | "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. |
| **TACO + HyenaDNA** | lab work | external baseline rows in Table 1 β€” reviewers will ask |