Master progress dashboard β 2026-04-27 ~04:40 UTC
Every experiment that's been run, is running, or is queued β across
H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP /
unified-MDLM / diffusion) and contrastive (aux pair, aligner loss)
ablations.
1. Live processes (H100)
| PID |
Job |
Elapsed |
ETA |
| 100474 |
launch_bench_vllm.sh orchestrator |
since Apr 26 |
runs until last task completes |
| 137805 |
T1 reasoning expansion (build_reasoning_traces.py) |
53 min |
~10 min remaining (281/333) |
| 139902 |
T3 zs_raw vLLM bench |
35 min |
~4.5 h |
| 100544 |
watcher β post_bench_pipeline.sh |
idle |
fires when bench grid exits |
2. T1 β enhancer_generation
| # |
Variant |
Host |
n |
parse |
gc_err |
len_ratio |
Cells |
Sample path |
| 1 |
zs_raw (full) |
H100 |
372,210 |
0.9996 |
0.116 |
1.64 |
7-cell |
runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/ |
| 2 |
zs_enriched (full) |
H100 |
372,210 |
0.9997 |
0.126 |
1.67 |
7-cell |
runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/ |
| 3 |
zs_raw TRUNCATED (max=200, superseded) |
H100 |
372,210 |
0.9996 |
0.124 |
0.72 |
7-cell |
runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/ |
| 4 |
zs_raw smoke (n=64 Ex) |
lab |
64 |
1.0 |
0.093 |
1.83 |
Ex |
_lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/ |
| 5 |
zs_enriched smoke (n=64) |
lab |
64 |
1.0 |
0.096 |
1.62 |
Ex |
β¦/zs_enriched/ |
| 6 |
lora_raw smoke [COLLAPSED] |
lab |
64 |
1.0 |
0.070 |
3.64 π¨ |
Ex |
β¦/lora_raw/ |
| 7 |
lora_enriched smoke [COLLAPSED] |
lab |
64 |
1.0 |
0.102 |
3.90 π¨ |
Ex |
β¦/lora_enriched/ |
| 8 |
fusion-SFT (Stage 1) |
H100 |
β |
β |
β |
β |
β |
QUEUED (auto post-bench) |
| 9 |
NTv3-MDLM (Stage 5) |
H100 |
β |
β |
β |
β |
β |
QUEUED |
| 10 |
Reasoning expansion (Tier 2) |
H100 |
281 / 333 |
0 leaks |
rich rationales |
β |
7-cell |
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl |
3. T2 β pair_prediction
| # |
Variant |
Host |
n |
accuracy |
F1 |
precision |
recall |
Cells |
Sample path |
| 1 |
zs_raw (full) |
H100 |
744,420 |
0.500 |
0.0001 |
0.65 |
~0 |
7-cell |
runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/ |
| 2 |
zs_enriched (full) |
H100 |
744,420 |
0.500 |
0.002 |
0.58 |
0.001 |
7-cell |
runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/ |
| 3 |
asym pair NTv3+NT-v2 aux=none |
lab |
128 |
0.773 |
0.808 |
0.701 |
0.953 |
Ex |
_lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/ |
| 4 |
asym pair aux=supcon_pair |
lab |
128 |
0.719 |
0.710 |
0.733 |
0.688 |
Ex |
β¦/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/ |
| 5 |
asym pair aux=tier_aware_supcon |
lab |
128 |
0.711 |
0.776 |
0.634 |
1.000 |
Ex |
β¦/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/ |
| 6 |
fusion-SFT (Stage 2) |
H100 |
β |
β |
β |
β |
β |
β |
QUEUED |
| 7 |
Galaxy regen (enhancer TFBS scan) |
lab/galaxy |
β |
β |
β |
β |
β |
β |
PROVISIONED (lab patched script in dec7a3e); not yet launched |
| 8 |
Reasoning expansion (Tier 2) |
H100 |
β |
gated on #7 |
β |
β |
β |
β |
DEFERRED (Stage 3e) |
| 9 |
NTv3-direct (Stage 6) |
H100 |
β |
β |
β |
β |
β |
β |
QUEUED |
β οΈ T2 zero-shot is degenerate: model trivially predicts not_paired β recall β 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 β proves the architecture works, full benchmark pending.
4. T3 β enhancer_editing
| # |
Variant |
Host |
Status |
Sample path |
| 1 |
zs_raw bench (full ~372k) |
H100 |
RUNNING (PID 139902, 35 min in, ETA ~4.5 h) |
runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/ |
| 2 |
zs_enriched bench (full ~372k) |
H100 |
queued behind #1 |
same parent dir / zs_enriched/ |
| 3 |
fusion-SFT (Stage 3, heuristic gold) |
H100 |
queued (auto, post-bench) |
runs/exp_t3_fusion_sft_20260427_h100/ |
| 4 |
reasoning-only ablation (Stage 3b) |
H100 |
queued |
runs/exp_t3_fusion_sft_reasonly_20260427_h100/ |
| 5 |
multi-turn RFT (Stage 3c, --rounds 4) |
H100 |
queued |
runs/exp_t3_fusion_sft_rft_20260427_h100/ |
| 6 |
post-RFT reasoning expansion (Stage 3d) |
H100 |
queued (gated on #5) |
data/reasoning_traces/train.enhancer_editing.reasoning.jsonl |
| 7 |
RFT-from-joint ablation |
lab |
proposed in t3_post_v5_followups.md Β§1 |
β |
| 8 |
Loop-SFT on post-RFT |
lab |
proposed |
β |
5. Joint multitask (the headline)
| # |
Variant |
Host |
n |
Status |
Path |
| 1 |
Joint multitask balanced 35kΓ3 (Stage 4) |
H100 |
105k train |
QUEUED |
input: data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl (992MB, 35k T1 + 35k T2 + 35k T3) |
| 2 |
Score adapter on T1 raw / enriched |
H100 |
372k |
queued |
predict_t1_{raw,enriched}/genqual.json |
| 3 |
Score adapter on T2 raw / enriched |
H100 |
744k |
queued |
predict_t2_{raw,enriched}/metrics.json |
| 4 |
Score adapter on T3 raw / enriched |
H100 |
372k |
queued |
predict_t3_{raw,enriched}/genqual_t3_oracle.json |
6. Architecture-mode ablation (Table 3 Phase 2 β llava vs unified+ntp vs unified+mdlm vs diffusion)
Status: DEFERRED to lab cluster.
The DNA-output-head ablation surface is wired (scripts/train_fusion_sft.py --architecture-mode {llava,unified,diffusion}, --dna-loss-kind {mdlm,ntp},
--dna-loss-weight Ξ») β see docs/unified_multimodal_lm_survey.md for the
survey behind it. Currently:
| Mode |
Status |
Where wired |
llava (default; LLM head emits DNA as text tokens) |
In every fusion-SFT call on H100 |
slurm/post_bench_pipeline_h100_v5.sh Stages 1/2/3/4 use --architecture-mode llava |
unified+ntp (DNA head with plain CE on the DNA vocab) |
Wired but not launched |
slurm/run_unified_arch_ablation.sh |
unified+mdlm (DNA head with LLaDA ELBO + 1/t reweight) |
wired, not launched |
same launcher |
diffusion (LLaDA full diffusion) |
NOT YET WIRED (per train_fusion_sft.py:88: "Phase 3 = diffusion (LLaDA, not yet wired)") |
future |
Lab action item added: launch slurm/run_unified_arch_ablation.sh
on a non-H100 node (the H100 stays focused on the headline runs).
Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm
on T1. ETA ~10h per arch on a lab GPU. Already documented in
docs/minimal_publishable_suite.md Β§4e.
7. Contrastive / aux-loss ablations
7a. T2 pair-aux contrastive (DONE β lab smoke)
3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at
n=128 Ex (rows 3β5 in the T2 table above). The full-set re-run will
fire after the galaxy regen lands so the new T2 enriched JSONL feeds
both the asym-pair model and the fusion-SFT stack.
7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)
slurm/run_aligner_loss_ablation.sh β three loss variants
(infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner
(promoterβenhancerβexpression). Status: wired, not launched. Lab
side. Documented in docs/minimal_publishable_suite.md Β§4b.
7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)
slurm/run_multi_encoder_grid.sh β DNA-encoder ablation at the T1 /
T2 layer. Wired; NTv3-650M is the current default everywhere.
Lab side, not launched.
8. Oracle + supporting infra
| Asset |
Host |
Status |
Path |
DeepSTARR-7cell oracle (val_pearson_mean=0.136, weak-but-aggregable) |
lab |
DONE |
_lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt |
| Enformer oracle (Table 4 cross-oracle) |
lab |
not built |
β |
| Sei oracle (Table 4) |
lab |
not built |
β |
| Joint multitask balanced 105k JSONL |
H100 |
DONE (35k Γ 3 verified) |
data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl |
| Test JSONLs (T1+T2+T3 full) |
H100 |
DONE |
data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl |
9. External baselines (the comparison gap)
| Model |
Status |
Priority |
Doc |
| TACO (Lin et al. NeurIPS 2024) β T3 paper precedent |
NOT STARTED |
HIGH |
t3_post_v5_followups.md Β§5 |
| HyenaDNA β T2 fluency baseline |
NOT STARTED |
HIGH |
same |
| DNABERT-2 / NT-v2 β encoder baselines |
wired as encoders only; head not trained |
MEDIUM |
same |
| CtrlDNA β T1 conditional gen |
NOT STARTED |
MEDIUM |
same |
| Evo / Evo2 β large fluency |
NOT STARTED |
LOW |
same |
Lab action item: TACO + HyenaDNA, ~1 day each.
10. SV-GSPO (RL) + Loop-SFT β pipeline state
| Component |
Status |
SV-GSPO outcome reward for T3 (was buggy, fixed in e133cf1) |
code synced, not yet trained |
| SV-GSPO ablation grid (Table 2: cost-aware / kβ-KL / DAPO / KL=0 / no-group-norm) |
pipeline wired, not yet launched |
| Loop-SFT on heuristic-gold trajectories |
pipeline wired, not launched |
| Loop-SFT on post-RFT trajectories (T3 only) |
proposed in t3_post_v5_followups.md Β§3 |
11. Branch + HF state
HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
4 commits ahead of mllm-integrate
0 commits behind (lab fully caught up)
HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (live, 281 rows)
data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
results/h100_snapshot.md
12. Total ETA to headline
| Step |
Wall-clock |
| T3 zs_raw + enriched bench |
~10 h from now |
| Stage 0c oracle scoring on T3 zs preds |
~30 min after bench |
| Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) |
~22 h |
| Stage 3b (T3 reasoning-only) |
~3 h |
| Stage 3c (T3 multi-turn RFT + retrain) |
~5 h |
| Stage 3d (T3 reasoning expansion 333 rows) |
~30 min IO-bound (in parallel with Stage 4) |
| Stage 3f (T1 reasoning) |
continuous, +333/day |
| Stage 4 (joint multitask 105k) |
~10 h |
| Stages 5+6 (NTv3-only baselines) |
~4 h |
| Stage 7 (aggregator + final HF push) |
minutes |
Total H100 post-bench: ~36 h. With lab cluster handling
arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel,
the headline submission lands in ~3β4 days.
13. Critical gates (what's blocking)
| Gate |
Blocker |
Unblocks |
| Galaxy T2 enhancer regen |
lab launches slurm/regen_t2_enriched_with_enhancer_scan.sh; ~8h CPU |
T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1 |
| T3 RFT runs (Stage 3c) |
needs Stage 3 done |
T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column |
| Reasoning accumulation (Tier 2) |
OpenRouter 1000/day cap per key |
"Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side. |
| TACO + HyenaDNA |
lab work |
external baseline rows in Table 1 β reviewers will ask |