Upload results/master_progress.md with huggingface_hub
Browse files- results/master_progress.md +188 -0
results/master_progress.md
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Master progress dashboard β 2026-04-27 ~04:40 UTC
|
| 2 |
+
|
| 3 |
+
Every experiment that's been run, is running, or is queued β across
|
| 4 |
+
H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
|
| 5 |
+
unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
|
| 6 |
+
ablations.
|
| 7 |
+
|
| 8 |
+
## 1. Live processes (H100)
|
| 9 |
+
|
| 10 |
+
| PID | Job | Elapsed | ETA |
|
| 11 |
+
|---|---|---|---|
|
| 12 |
+
| 100474 | `launch_bench_vllm.sh` orchestrator | since Apr 26 | runs until last task completes |
|
| 13 |
+
| 137805 | T1 reasoning expansion (`build_reasoning_traces.py`) | 53 min | ~10 min remaining (281/333) |
|
| 14 |
+
| 139902 | T3 zs_raw vLLM bench | 35 min | ~4.5 h |
|
| 15 |
+
| 100544 | watcher β `post_bench_pipeline.sh` | idle | fires when bench grid exits |
|
| 16 |
+
|
| 17 |
+
## 2. T1 β enhancer_generation
|
| 18 |
+
|
| 19 |
+
| # | Variant | Host | n | parse | gc_err | len_ratio | Cells | Sample path |
|
| 20 |
+
|---|---|---|---|---|---|---|---|---|
|
| 21 |
+
| 1 | **zs_raw (full)** | H100 | 372,210 | 0.9996 | 0.116 | **1.64** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
|
| 22 |
+
| 2 | **zs_enriched (full)** | H100 | 372,210 | 0.9997 | 0.126 | **1.67** | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
|
| 23 |
+
| 3 | zs_raw TRUNCATED (max=200, **superseded**) | H100 | 372,210 | 0.9996 | 0.124 | 0.72 | 7-cell | `runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/` |
|
| 24 |
+
| 4 | zs_raw smoke (n=64 Ex) | lab | 64 | 1.0 | 0.093 | 1.83 | Ex | `_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/` |
|
| 25 |
+
| 5 | zs_enriched smoke (n=64) | lab | 64 | 1.0 | 0.096 | 1.62 | Ex | `β¦/zs_enriched/` |
|
| 26 |
+
| 6 | **lora_raw smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.070 | **3.64** π¨ | Ex | `β¦/lora_raw/` |
|
| 27 |
+
| 7 | **lora_enriched smoke [COLLAPSED]** | lab | 64 | 1.0 | 0.102 | **3.90** π¨ | Ex | `β¦/lora_enriched/` |
|
| 28 |
+
| 8 | **fusion-SFT (Stage 1)** | H100 | β | β | β | β | β | **QUEUED** (auto post-bench) |
|
| 29 |
+
| 9 | **NTv3-MDLM (Stage 5)** | H100 | β | β | β | β | β | **QUEUED** |
|
| 30 |
+
| 10 | **Reasoning expansion (Tier 2)** | H100 | 281 / 333 | 0 leaks | rich rationales | β | 7-cell | `data/reasoning_traces/train.enhancer_generation.reasoning.jsonl` |
|
| 31 |
+
|
| 32 |
+
## 3. T2 β pair_prediction
|
| 33 |
+
|
| 34 |
+
| # | Variant | Host | n | accuracy | F1 | precision | recall | Cells | Sample path |
|
| 35 |
+
|---|---|---|---|---|---|---|---|---|---|
|
| 36 |
+
| 1 | **zs_raw (full)** | H100 | 744,420 | 0.500 | 0.0001 | 0.65 | ~0 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
|
| 37 |
+
| 2 | **zs_enriched (full)** | H100 | 744,420 | 0.500 | 0.002 | 0.58 | 0.001 | 7-cell | `runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/` |
|
| 38 |
+
| 3 | asym pair NTv3+NT-v2 aux=**none** | lab | 128 | 0.773 | **0.808** | 0.701 | 0.953 | Ex | `_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/` |
|
| 39 |
+
| 4 | asym pair aux=**supcon_pair** | lab | 128 | 0.719 | 0.710 | 0.733 | 0.688 | Ex | `β¦/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/` |
|
| 40 |
+
| 5 | asym pair aux=**tier_aware_supcon** | lab | 128 | 0.711 | 0.776 | 0.634 | **1.000** | Ex | `β¦/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/` |
|
| 41 |
+
| 6 | **fusion-SFT (Stage 2)** | H100 | β | β | β | β | β | β | **QUEUED** |
|
| 42 |
+
| 7 | **Galaxy regen (enhancer TFBS scan)** | lab/galaxy | β | β | β | β | β | β | **PROVISIONED** (lab patched script in `dec7a3e`); not yet launched |
|
| 43 |
+
| 8 | **Reasoning expansion (Tier 2)** | H100 | β | gated on #7 | β | β | β | β | **DEFERRED** (Stage 3e) |
|
| 44 |
+
| 9 | **NTv3-direct (Stage 6)** | H100 | β | β | β | β | β | β | **QUEUED** |
|
| 45 |
+
|
| 46 |
+
β οΈ **T2 zero-shot is degenerate**: model trivially predicts `not_paired` β recall β 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 β proves the architecture works, full benchmark pending.
|
| 47 |
+
|
| 48 |
+
## 4. T3 β enhancer_editing
|
| 49 |
+
|
| 50 |
+
| # | Variant | Host | Status | Sample path |
|
| 51 |
+
|---|---|---|---|---|
|
| 52 |
+
| 1 | **zs_raw bench (full ~372k)** | H100 | **RUNNING** (PID 139902, 35 min in, ETA ~4.5 h) | `runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/` |
|
| 53 |
+
| 2 | zs_enriched bench (full ~372k) | H100 | queued behind #1 | same parent dir / `zs_enriched/` |
|
| 54 |
+
| 3 | fusion-SFT (Stage 3, heuristic gold) | H100 | queued (auto, post-bench) | `runs/exp_t3_fusion_sft_20260427_h100/` |
|
| 55 |
+
| 4 | **reasoning-only ablation (Stage 3b)** | H100 | queued | `runs/exp_t3_fusion_sft_reasonly_20260427_h100/` |
|
| 56 |
+
| 5 | **multi-turn RFT (Stage 3c, --rounds 4)** | H100 | queued | `runs/exp_t3_fusion_sft_rft_20260427_h100/` |
|
| 57 |
+
| 6 | **post-RFT reasoning expansion (Stage 3d)** | H100 | queued (gated on #5) | `data/reasoning_traces/train.enhancer_editing.reasoning.jsonl` |
|
| 58 |
+
| 7 | RFT-from-joint ablation | lab | proposed in `t3_post_v5_followups.md` Β§1 | β |
|
| 59 |
+
| 8 | Loop-SFT on post-RFT | lab | proposed | β |
|
| 60 |
+
|
| 61 |
+
## 5. Joint multitask (the headline)
|
| 62 |
+
|
| 63 |
+
| # | Variant | Host | n | Status | Path |
|
| 64 |
+
|---|---|---|---|---|---|
|
| 65 |
+
| 1 | **Joint multitask balanced 35kΓ3** (Stage 4) | H100 | 105k train | **QUEUED** | input: `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` (992MB, 35k T1 + 35k T2 + 35k T3) |
|
| 66 |
+
| 2 | Score adapter on T1 raw / enriched | H100 | 372k | queued | `predict_t1_{raw,enriched}/genqual.json` |
|
| 67 |
+
| 3 | Score adapter on T2 raw / enriched | H100 | 744k | queued | `predict_t2_{raw,enriched}/metrics.json` |
|
| 68 |
+
| 4 | Score adapter on T3 raw / enriched | H100 | 372k | queued | `predict_t3_{raw,enriched}/genqual_t3_oracle.json` |
|
| 69 |
+
|
| 70 |
+
## 6. Architecture-mode ablation (Table 3 Phase 2 β `llava` vs `unified+ntp` vs `unified+mdlm` vs `diffusion`)
|
| 71 |
+
|
| 72 |
+
**Status: DEFERRED to lab cluster.**
|
| 73 |
+
|
| 74 |
+
The DNA-output-head ablation surface is wired (`scripts/train_fusion_sft.py
|
| 75 |
+
--architecture-mode {llava,unified,diffusion}`, `--dna-loss-kind {mdlm,ntp}`,
|
| 76 |
+
`--dna-loss-weight Ξ»`) β see `docs/unified_multimodal_lm_survey.md` for the
|
| 77 |
+
survey behind it. Currently:
|
| 78 |
+
|
| 79 |
+
| Mode | Status | Where wired |
|
| 80 |
+
|---|---|---|
|
| 81 |
+
| `llava` (default; LLM head emits DNA as text tokens) | **In every fusion-SFT call on H100** | `slurm/post_bench_pipeline_h100_v5.sh` Stages 1/2/3/4 use `--architecture-mode llava` |
|
| 82 |
+
| `unified+ntp` (DNA head with plain CE on the DNA vocab) | **Wired but not launched** | `slurm/run_unified_arch_ablation.sh` |
|
| 83 |
+
| `unified+mdlm` (DNA head with LLaDA ELBO + 1/t reweight) | wired, not launched | same launcher |
|
| 84 |
+
| `diffusion` (LLaDA full diffusion) | **NOT YET WIRED** (per `train_fusion_sft.py:88`: "Phase 3 = diffusion (LLaDA, not yet wired)") | future |
|
| 85 |
+
|
| 86 |
+
**Lab action item** added: launch `slurm/run_unified_arch_ablation.sh`
|
| 87 |
+
on a non-H100 node (the H100 stays focused on the headline runs).
|
| 88 |
+
Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
|
| 89 |
+
on T1. ETA ~10h per arch on a lab GPU. Already documented in
|
| 90 |
+
`docs/minimal_publishable_suite.md Β§4e`.
|
| 91 |
+
|
| 92 |
+
## 7. Contrastive / aux-loss ablations
|
| 93 |
+
|
| 94 |
+
### 7a. T2 pair-aux contrastive (DONE β lab smoke)
|
| 95 |
+
|
| 96 |
+
3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
|
| 97 |
+
n=128 Ex (rows 3β5 in the T2 table above). The full-set re-run will
|
| 98 |
+
fire after the galaxy regen lands so the new T2 enriched JSONL feeds
|
| 99 |
+
both the asym-pair model and the fusion-SFT stack.
|
| 100 |
+
|
| 101 |
+
### 7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)
|
| 102 |
+
|
| 103 |
+
`slurm/run_aligner_loss_ablation.sh` β three loss variants
|
| 104 |
+
(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
|
| 105 |
+
(promoterβenhancerβexpression). Status: **wired, not launched**. Lab
|
| 106 |
+
side. Documented in `docs/minimal_publishable_suite.md Β§4b`.
|
| 107 |
+
|
| 108 |
+
### 7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)
|
| 109 |
+
|
| 110 |
+
`slurm/run_multi_encoder_grid.sh` β DNA-encoder ablation at the T1 /
|
| 111 |
+
T2 layer. **Wired**; NTv3-650M is the current default everywhere.
|
| 112 |
+
Lab side, not launched.
|
| 113 |
+
|
| 114 |
+
## 8. Oracle + supporting infra
|
| 115 |
+
|
| 116 |
+
| Asset | Host | Status | Path |
|
| 117 |
+
|---|---|---|---|
|
| 118 |
+
| DeepSTARR-7cell oracle (`val_pearson_mean=0.136`, weak-but-aggregable) | lab | DONE | `_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt` |
|
| 119 |
+
| Enformer oracle (Table 4 cross-oracle) | lab | not built | β |
|
| 120 |
+
| Sei oracle (Table 4) | lab | not built | β |
|
| 121 |
+
| Joint multitask balanced 105k JSONL | H100 | DONE (35k Γ 3 verified) | `data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl` |
|
| 122 |
+
| Test JSONLs (T1+T2+T3 full) | H100 | DONE | `data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl` |
|
| 123 |
+
|
| 124 |
+
## 9. External baselines (the comparison gap)
|
| 125 |
+
|
| 126 |
+
| Model | Status | Priority | Doc |
|
| 127 |
+
|---|---|---|---|
|
| 128 |
+
| TACO (Lin et al. NeurIPS 2024) β T3 paper precedent | **NOT STARTED** | **HIGH** | `t3_post_v5_followups.md Β§5` |
|
| 129 |
+
| HyenaDNA β T2 fluency baseline | NOT STARTED | **HIGH** | same |
|
| 130 |
+
| DNABERT-2 / NT-v2 β encoder baselines | wired as encoders only; head not trained | MEDIUM | same |
|
| 131 |
+
| CtrlDNA β T1 conditional gen | NOT STARTED | MEDIUM | same |
|
| 132 |
+
| Evo / Evo2 β large fluency | NOT STARTED | LOW | same |
|
| 133 |
+
|
| 134 |
+
Lab action item: TACO + HyenaDNA, ~1 day each.
|
| 135 |
+
|
| 136 |
+
## 10. SV-GSPO (RL) + Loop-SFT β pipeline state
|
| 137 |
+
|
| 138 |
+
| Component | Status |
|
| 139 |
+
|---|---|
|
| 140 |
+
| SV-GSPO outcome reward for T3 (was buggy, **fixed in `e133cf1`**) | code synced, not yet trained |
|
| 141 |
+
| SV-GSPO ablation grid (Table 2: cost-aware / kβ-KL / DAPO / KL=0 / no-group-norm) | pipeline wired, **not yet launched** |
|
| 142 |
+
| Loop-SFT on heuristic-gold trajectories | pipeline wired, not launched |
|
| 143 |
+
| Loop-SFT on post-RFT trajectories (T3 only) | proposed in `t3_post_v5_followups.md Β§3` |
|
| 144 |
+
|
| 145 |
+
## 11. Branch + HF state
|
| 146 |
+
|
| 147 |
+
```
|
| 148 |
+
HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
|
| 149 |
+
4 commits ahead of mllm-integrate
|
| 150 |
+
0 commits behind (lab fully caught up)
|
| 151 |
+
|
| 152 |
+
HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
|
| 153 |
+
runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
|
| 154 |
+
runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
|
| 155 |
+
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (live, 281 rows)
|
| 156 |
+
data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
|
| 157 |
+
data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
|
| 158 |
+
docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
|
| 159 |
+
results/h100_snapshot.md
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
## 12. Total ETA to headline
|
| 163 |
+
|
| 164 |
+
| Step | Wall-clock |
|
| 165 |
+
|---|---|
|
| 166 |
+
| T3 zs_raw + enriched bench | ~10 h from now |
|
| 167 |
+
| Stage 0c oracle scoring on T3 zs preds | ~30 min after bench |
|
| 168 |
+
| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) | ~22 h |
|
| 169 |
+
| Stage 3b (T3 reasoning-only) | ~3 h |
|
| 170 |
+
| Stage 3c (T3 multi-turn RFT + retrain) | ~5 h |
|
| 171 |
+
| Stage 3d (T3 reasoning expansion 333 rows) | ~30 min IO-bound (in parallel with Stage 4) |
|
| 172 |
+
| Stage 3f (T1 reasoning) | continuous, +333/day |
|
| 173 |
+
| Stage 4 (joint multitask 105k) | ~10 h |
|
| 174 |
+
| Stages 5+6 (NTv3-only baselines) | ~4 h |
|
| 175 |
+
| Stage 7 (aggregator + final HF push) | minutes |
|
| 176 |
+
|
| 177 |
+
**Total H100 post-bench: ~36 h**. With lab cluster handling
|
| 178 |
+
arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
|
| 179 |
+
the headline submission lands in **~3β4 days**.
|
| 180 |
+
|
| 181 |
+
## 13. Critical gates (what's blocking)
|
| 182 |
+
|
| 183 |
+
| Gate | Blocker | Unblocks |
|
| 184 |
+
|---|---|---|
|
| 185 |
+
| **Galaxy T2 enhancer regen** | lab launches `slurm/regen_t2_enriched_with_enhancer_scan.sh`; ~8h CPU | T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 |
|
| 186 |
+
| **T3 RFT runs (Stage 3c)** | needs Stage 3 done | T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column |
|
| 187 |
+
| **Reasoning accumulation (Tier 2)** | OpenRouter 1000/day cap per key | "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. |
|
| 188 |
+
| **TACO + HyenaDNA** | lab work | external baseline rows in Table 1 β reviewers will ask |
|