File size: 11,646 Bytes
bf10107 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | # Master progress dashboard β 2026-04-27 ~04:40 UTC
Every experiment that's been run, is running, or is queued β across
H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
ablations.
## 1. Live processes (H100)
| PID | Job | Elapsed | ETA |
|---|---|---|---|
| 100474 | `launch_bench_vllm.sh` orchestrator | since Apr 26 | runs until last task completes |
| 137805 | T1 reasoning expansion (`build_reasoning_traces.py`) | 53 min | ~10 min remaining (281/333) |
| 139902 | T3 zs_raw vLLM bench | 35 min | ~4.5 h |
| 100544 | watcher β `post_bench_pipeline.sh` | idle | fires when bench grid exits |
## 2. T1 β enhancer_generation
| # | Variant | Host | n | parse | gc_err | len_ratio | Cells | Sample path |
|---|---|---|---|---|---|---|---|---|
| 1 | **zs_raw (full)** | H100 | 372,210 | 0.9996 | 0.116 | **1.64** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | **zs_enriched (full)** | H100 | 372,210 | 0.9997 | 0.126 | **1.67** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
| 3 | zs_raw TRUNCATED (max=200, **superseded**) | H100 | 372,210 | 0.9996 | 0.124 | 0.72 | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/` |
| 4 | zs_raw smoke (n=64 Ex) | lab | 64 | 1.0 | 0.093 | 1.83 | Ex | `_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/` |
| 5 | zs_enriched smoke (n=64) | lab | 64 | 1.0 | 0.096 | 1.62 | Ex | `β¦/zs_enriched/` |
| 6 | **lora_raw smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.070 | **3.64** π¨ | Ex | `β¦/lora_raw/` |
| 7 | **lora_enriched smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.102 | **3.90** π¨ | Ex | `β¦/lora_enriched/` |
| 8 | **fusion-SFT (Stage 1)** | H100 | β | β | β | β | β | **QUEUED** (auto post-bench) |
| 9 | **NTv3-MDLM (Stage 5)** | H100 | β | β | β | β | β | **QUEUED** |
| 10 | **Reasoning expansion (Tier 2)** | H100 | 281 / 333 | 0 leaks | rich rationales | β | 7-cell | `data/reasoning_traces/train.enhancer_generation.reasoning.jsonl` |
## 3. T2 β pair_prediction
| # | Variant | Host | n | accuracy | F1 | precision | recall | Cells | Sample path |
|---|---|---|---|---|---|---|---|---|---|
| 1 | **zs_raw (full)** | H100 | 744,420 | 0.500 | 0.0001 | 0.65 | ~0 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | **zs_enriched (full)** | H100 | 744,420 | 0.500 | 0.002 | 0.58 | 0.001 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
| 3 | asym pair NTv3+NT-v2 aux=**none** | lab | 128 | 0.773 | **0.808** | 0.701 | 0.953 | Ex | `_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/` |
| 4 | asym pair aux=**supcon_pair** | lab | 128 | 0.719 | 0.710 | 0.733 | 0.688 | Ex | `β¦/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/` |
| 5 | asym pair aux=**tier_aware_supcon** | lab | 128 | 0.711 | 0.776 | 0.634 | **1.000** | Ex | `β¦/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/` |
| 6 | **fusion-SFT (Stage 2)** | H100 | β | β | β | β | β | β | **QUEUED** |
| 7 | **Galaxy regen (enhancer TFBS scan)** | lab/galaxy | β | β | β | β | β | β | **PROVISIONED** (lab patched script in `dec7a3e`); not yet launched |
| 8 | **Reasoning expansion (Tier 2)** | H100 | β | gated on #7 | β | β | β | β | **DEFERRED** (Stage 3e) |
| 9 | **NTv3-direct (Stage 6)** | H100 | β | β | β | β | β | β | **QUEUED** |
β οΈ **T2 zero-shot is degenerate**: model trivially predicts `not_paired` β recall β 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 β proves the architecture works, full benchmark pending.
## 4. T3 β enhancer_editing
| # | Variant | Host | Status | Sample path |
|---|---|---|---|---|
| 1 | **zs_raw bench (full ~372k)** | H100 | **RUNNING** (PID 139902, 35 min in, ETA ~4.5 h) | `runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
| 2 | zs_enriched bench (full ~372k) | H100 | queued behind #1 | same parent dir / `zs_enriched/` |
| 3 | fusion-SFT (Stage 3, heuristic gold) | H100 | queued (auto, post-bench) | `runs/exp_t3_fusion_sft_20260427_h100/` |
| 4 | **reasoning-only ablation (Stage 3b)** | H100 | queued | `runs/exp_t3_fusion_sft_reasonly_20260427_h100/` |
| 5 | **multi-turn RFT (Stage 3c, --rounds 4)** | H100 | queued | `runs/exp_t3_fusion_sft_rft_20260427_h100/` |
| 6 | **post-RFT reasoning expansion (Stage 3d)** | H100 | queued (gated on #5) | `data/reasoning_traces/train.enhancer_editing.reasoning.jsonl` |
| 7 | RFT-from-joint ablation | lab | proposed in `t3_post_v5_followups.md` Β§1 | β |
| 8 | Loop-SFT on post-RFT | lab | proposed | β |
## 5. Joint multitask (the headline)
| # | Variant | Host | n | Status | Path |
|---|---|---|---|---|---|
| 1 | **Joint multitask balanced 35kΓ3** (Stage 4) | H100 | 105k train | **QUEUED** | input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3) |
| 2 | Score adapter on T1 raw / enriched | H100 | 372k | queued | `predict_t1_{raw,enriched}/genqual.json` |
| 3 | Score adapter on T2 raw / enriched | H100 | 744k | queued | `predict_t2_{raw,enriched}/metrics.json` |
| 4 | Score adapter on T3 raw / enriched | H100 | 372k | queued | `predict_t3_{raw,enriched}/genqual_t3_oracle.json` |
## 6. Architecture-mode ablation (Table 3 Phase 2 β `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)
**Status: DEFERRED to lab cluster.**
The DNA-output-head ablation surface is wired (`scripts/train_fusion_sft.py
--architecture-mode {llava,unified,diffusion}`, `--dna-loss-kind {mdlm,ntp}`,
`--dna-loss-weight Ξ»`) β see `docs/unified_multimodal_lm_survey.md` for the
survey behind it. Currently:
| Mode | Status | Where wired |
|---|---|---|
| `llava` (default; LLM head emits DNA as text tokens) | **In every fusion-SFT call on H100** | `slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava` |
| `unified+ntp` (DNA head with plain CE on the DNA vocab) | **Wired but not launched** | `slurm/run_unified_arch_ablation.sh` |
| `unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight) | wired, not launched | same launcher |
| `diffusion` (LLaDA full diffusion) | **NOT YET WIRED** (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)") | future |
**Lab action item** added: launch `slurm/run_unified_arch_ablation.sh`
on a non-H100 node (the H100 stays focused on the headline runs).
Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
on T1. ETA ~10h per arch on a lab GPU. Already documented in
`docs/minimal_publishable_suite.md Β§4e`.
## 7. Contrastive / aux-loss ablations
### 7a. T2 pair-aux contrastive (DONE β lab smoke)
3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
n=128 Ex (rows 3β5 in the T2 table above). The full-set re-run will
fire after the galaxy regen lands so the new T2 enriched JSONL feeds
both the asym-pair model and the fusion-SFT stack.
### 7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)
`slurm/run_aligner_loss_ablation.sh` β three loss variants
(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
(promoterβenhancerβexpression). Status: **wired, not launched**. Lab
side. Documented in `docs/minimal_publishable_suite.md Β§4b`.
### 7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)
`slurm/run_multi_encoder_grid.sh` β DNA-encoder ablation at the T1 /
T2 layer. **Wired**; NTv3-650M is the current default everywhere.
Lab side, not launched.
## 8. Oracle + supporting infra
| Asset | Host | Status | Path |
|---|---|---|---|
| DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable) | lab | DONE | `_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt` |
| Enformer oracle (Table 4 cross-oracle) | lab | not built | β |
| Sei oracle (Table 4) | lab | not built | β |
| Joint multitask balanced 105k JSONL | H100 | DONE (35k Γ 3 verified) | `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` |
| Test JSONLs (T1+T2+T3 full) | H100 | DONE | `data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl` |
## 9. External baselines (the comparison gap)
| Model | Status | Priority | Doc |
|---|---|---|---|
| TACO (Lin et al. NeurIPS 2024) β T3 paper precedent | **NOT STARTED** | **HIGH** | `t3_post_v5_followups.md Β§5` |
| HyenaDNA β T2 fluency baseline | NOT STARTED | **HIGH** | same |
| DNABERT-2 / NT-v2 β encoder baselines | wired as encoders only; head not trained | MEDIUM | same |
| CtrlDNA β T1 conditional gen | NOT STARTED | MEDIUM | same |
| Evo / Evo2 β large fluency | NOT STARTED | LOW | same |
Lab action item: TACO + HyenaDNA, ~1 day each.
## 10. SV-GSPO (RL) + Loop-SFT β pipeline state
| Component | Status |
|---|---|
| SV-GSPO outcome reward for T3 (was buggy, **fixed in `e133cf1`**) | code synced, not yet trained |
| SV-GSPO ablation grid (Table 2: cost-aware / kβ-KL / DAPO / KL=0 / no-group-norm) | pipeline wired, **not yet launched** |
| Loop-SFT on heuristic-gold trajectories | pipeline wired, not launched |
| Loop-SFT on post-RFT trajectories (T3 only) | proposed in `t3_post_v5_followups.md Β§3` |
## 11. Branch + HF state
```
HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
4 commits ahead of mllm-integrate
0 commits behind (lab fully caught up)
HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (live, 281 rows)
data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
results/h100_snapshot.md
```
## 12. Total ETA to headline
| Step | Wall-clock |
|---|---|
| T3 zs_raw + enriched bench | ~10 h from now |
| Stage 0c oracle scoring on T3 zs preds | ~30 min after bench |
| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) | ~22 h |
| Stage 3b (T3 reasoning-only) | ~3 h |
| Stage 3c (T3 multi-turn RFT + retrain) | ~5 h |
| Stage 3d (T3 reasoning expansion 333 rows) | ~30 min IO-bound (in parallel with Stage 4) |
| Stage 3f (T1 reasoning) | continuous, +333/day |
| Stage 4 (joint multitask 105k) | ~10 h |
| Stages 5+6 (NTv3-only baselines) | ~4 h |
| Stage 7 (aggregator + final HF push) | minutes |
**Total H100 post-bench: ~36 h**. With lab cluster handling
arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
the headline submission lands in **~3β4 days**.
## 13. Critical gates (what's blocking)
| Gate | Blocker | Unblocks |
|---|---|---|
| **Galaxy T2 enhancer regen** | lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU | T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 |
| **T3 RFT runs (Stage 3c)** | needs Stage 3 done | T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column |
| **Reasoning accumulation (Tier 2)** | OpenRouter 1000/day cap per key | "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. |
| **TACO + HyenaDNA** | lab work | external baseline rows in Table 1 β reviewers will ask |
|