dnathinker-checkpoints / results /master_progress.md
explcre's picture
Upload results/master_progress.md with huggingface_hub
bf10107 verified

Master progress dashboard β€” 2026-04-27 ~04:40 UTC

Every experiment that's been run, is running, or is queued β€” across H100 + lab cluster. Includes architecture-mode (LLaVA / unified-NTP / unified-MDLM / diffusion) and contrastive (aux pair, aligner loss) ablations.

1. Live processes (H100)

PID Job Elapsed ETA
100474 launch_bench_vllm.sh orchestrator since Apr 26 runs until last task completes
137805 T1 reasoning expansion (build_reasoning_traces.py) 53 min ~10 min remaining (281/333)
139902 T3 zs_raw vLLM bench 35 min ~4.5 h
100544 watcher β†’ post_bench_pipeline.sh idle fires when bench grid exits

2. T1 β€” enhancer_generation

# Variant Host n parse gc_err len_ratio Cells Sample path
1 zs_raw (full) H100 372,210 0.9996 0.116 1.64 7-cell runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_raw/
2 zs_enriched (full) H100 372,210 0.9997 0.126 1.67 7-cell runs/exp_t1_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/
3 zs_raw TRUNCATED (max=200, superseded) H100 372,210 0.9996 0.124 0.72 7-cell runs/exp_t1_grid_separatedQA_20260426_h100_vllm_TRUNCATED/zs_raw/
4 zs_raw smoke (n=64 Ex) lab 64 1.0 0.093 1.83 Ex _lab_results/runs/exp_t1_grid_separatedQA_20260424_154915/zs_raw/
5 zs_enriched smoke (n=64) lab 64 1.0 0.096 1.62 Ex …/zs_enriched/
6 lora_raw smoke [COLLAPSED] lab 64 1.0 0.070 3.64 🚨 Ex …/lora_raw/
7 lora_enriched smoke [COLLAPSED] lab 64 1.0 0.102 3.90 🚨 Ex …/lora_enriched/
8 fusion-SFT (Stage 1) H100 β€” β€” β€” β€” β€” QUEUED (auto post-bench)
9 NTv3-MDLM (Stage 5) H100 β€” β€” β€” β€” β€” QUEUED
10 Reasoning expansion (Tier 2) H100 281 / 333 0 leaks rich rationales β€” 7-cell data/reasoning_traces/train.enhancer_generation.reasoning.jsonl

3. T2 β€” pair_prediction

# Variant Host n accuracy F1 precision recall Cells Sample path
1 zs_raw (full) H100 744,420 0.500 0.0001 0.65 ~0 7-cell runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_raw/
2 zs_enriched (full) H100 744,420 0.500 0.002 0.58 0.001 7-cell runs/exp_t2_grid_separatedQA_20260426_h100_vllm_full/zs_enriched/
3 asym pair NTv3+NT-v2 aux=none lab 128 0.773 0.808 0.701 0.953 Ex _lab_results/runs/exp_t2_pair_aux_none_20260425_192434_prod/
4 asym pair aux=supcon_pair lab 128 0.719 0.710 0.733 0.688 Ex …/exp_t2_pair_aux_supcon_pair_20260425_192434_prod/
5 asym pair aux=tier_aware_supcon lab 128 0.711 0.776 0.634 1.000 Ex …/exp_t2_pair_aux_tier_aware_supcon_20260425_192434_prod/
6 fusion-SFT (Stage 2) H100 β€” β€” β€” β€” β€” β€” QUEUED
7 Galaxy regen (enhancer TFBS scan) lab/galaxy β€” β€” β€” β€” β€” β€” PROVISIONED (lab patched script in dec7a3e); not yet launched
8 Reasoning expansion (Tier 2) H100 β€” gated on #7 β€” β€” β€” β€” DEFERRED (Stage 3e)
9 NTv3-direct (Stage 6) H100 β€” β€” β€” β€” β€” β€” QUEUED

⚠️ T2 zero-shot is degenerate: model trivially predicts not_paired β†’ recall β‰ˆ 0. Tool-enriched gives marginal lift but the missing enhancer-side TFBS scan is the bottleneck. Lab's asym-pair smokes (n=128 Ex) reach F1=0.81 β€” proves the architecture works, full benchmark pending.

4. T3 β€” enhancer_editing

# Variant Host Status Sample path
1 zs_raw bench (full ~372k) H100 RUNNING (PID 139902, 35 min in, ETA ~4.5 h) runs/exp_t3_grid_separatedQA_20260426_h100_vllm_full/zs_raw/
2 zs_enriched bench (full ~372k) H100 queued behind #1 same parent dir / zs_enriched/
3 fusion-SFT (Stage 3, heuristic gold) H100 queued (auto, post-bench) runs/exp_t3_fusion_sft_20260427_h100/
4 reasoning-only ablation (Stage 3b) H100 queued runs/exp_t3_fusion_sft_reasonly_20260427_h100/
5 multi-turn RFT (Stage 3c, --rounds 4) H100 queued runs/exp_t3_fusion_sft_rft_20260427_h100/
6 post-RFT reasoning expansion (Stage 3d) H100 queued (gated on #5) data/reasoning_traces/train.enhancer_editing.reasoning.jsonl
7 RFT-from-joint ablation lab proposed in t3_post_v5_followups.md Β§1 β€”
8 Loop-SFT on post-RFT lab proposed β€”

5. Joint multitask (the headline)

# Variant Host n Status Path
1 Joint multitask balanced 35kΓ—3 (Stage 4) H100 105k train QUEUED input: data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl (992MB, 35k T1 + 35k T2 + 35k T3)
2 Score adapter on T1 raw / enriched H100 372k queued predict_t1_{raw,enriched}/genqual.json
3 Score adapter on T2 raw / enriched H100 744k queued predict_t2_{raw,enriched}/metrics.json
4 Score adapter on T3 raw / enriched H100 372k queued predict_t3_{raw,enriched}/genqual_t3_oracle.json

6. Architecture-mode ablation (Table 3 Phase 2 β€” llava vs unified+ntp vs unified+mdlm vs diffusion)

Status: DEFERRED to lab cluster.

The DNA-output-head ablation surface is wired (scripts/train_fusion_sft.py --architecture-mode {llava,unified,diffusion}, --dna-loss-kind {mdlm,ntp}, --dna-loss-weight Ξ») β€” see docs/unified_multimodal_lm_survey.md for the survey behind it. Currently:

Mode Status Where wired
llava (default; LLM head emits DNA as text tokens) In every fusion-SFT call on H100 slurm/post_bench_pipeline_h100_v5.sh Stages 1/2/3/4 use --architecture-mode llava
unified+ntp (DNA head with plain CE on the DNA vocab) Wired but not launched slurm/run_unified_arch_ablation.sh
unified+mdlm (DNA head with LLaDA ELBO + 1/t reweight) wired, not launched same launcher
diffusion (LLaDA full diffusion) NOT YET WIRED (per train_fusion_sft.py:88: "Phase 3 = diffusion (LLaDA, not yet wired)") future

Lab action item added: launch slurm/run_unified_arch_ablation.sh on a non-H100 node (the H100 stays focused on the headline runs). Three jobs in one sbatch: llava (control) / unified+ntp / unified+mdlm on T1. ETA ~10h per arch on a lab GPU. Already documented in docs/minimal_publishable_suite.md Β§4e.

7. Contrastive / aux-loss ablations

7a. T2 pair-aux contrastive (DONE β€” lab smoke)

3 variants (Table 1 sub-figure / Table 3 row), all smoke-tested at n=128 Ex (rows 3–5 in the T2 table above). The full-set re-run will fire after the galaxy regen lands so the new T2 enriched JSONL feeds both the asym-pair model and the fusion-SFT stack.

7b. Aligner loss ablation (3 contrastive variants, T1 trimodal)

slurm/run_aligner_loss_ablation.sh β€” three loss variants (infoNCE / supcon / tier-aware-supcon-style) for the trimodal aligner (promoter↔enhancer↔expression). Status: wired, not launched. Lab side. Documented in docs/minimal_publishable_suite.md Β§4b.

7c. Multi-encoder grid (NTv3 vs HyenaDNA vs Caduceus)

slurm/run_multi_encoder_grid.sh β€” DNA-encoder ablation at the T1 / T2 layer. Wired; NTv3-650M is the current default everywhere. Lab side, not launched.

8. Oracle + supporting infra

Asset Host Status Path
DeepSTARR-7cell oracle (val_pearson_mean=0.136, weak-but-aggregable) lab DONE _lab_results/runs/exp_oracle_ds_7cell_fdr_both_20260424_162210/oracle.pt
Enformer oracle (Table 4 cross-oracle) lab not built β€”
Sei oracle (Table 4) lab not built β€”
Joint multitask balanced 105k JSONL H100 DONE (35k Γ— 3 verified) data/prod_samples/train.joint_multitask_balanced.strat7c.n105k.jsonl
Test JSONLs (T1+T2+T3 full) H100 DONE data/prod_full_test/jsonl/test.{enhancer_generation,pair_prediction,enhancer_editing}.jsonl

9. External baselines (the comparison gap)

Model Status Priority Doc
TACO (Lin et al. NeurIPS 2024) β€” T3 paper precedent NOT STARTED HIGH t3_post_v5_followups.md Β§5
HyenaDNA β€” T2 fluency baseline NOT STARTED HIGH same
DNABERT-2 / NT-v2 β€” encoder baselines wired as encoders only; head not trained MEDIUM same
CtrlDNA β€” T1 conditional gen NOT STARTED MEDIUM same
Evo / Evo2 β€” large fluency NOT STARTED LOW same

Lab action item: TACO + HyenaDNA, ~1 day each.

10. SV-GSPO (RL) + Loop-SFT β€” pipeline state

Component Status
SV-GSPO outcome reward for T3 (was buggy, fixed in e133cf1) code synced, not yet trained
SV-GSPO ablation grid (Table 2: cost-aware / k₃-KL / DAPO / KL=0 / no-group-norm) pipeline wired, not yet launched
Loop-SFT on heuristic-gold trajectories pipeline wired, not launched
Loop-SFT on post-RFT trajectories (T3 only) proposed in t3_post_v5_followups.md Β§3

11. Branch + HF state

HEAD on mllm-integrate-server2: f304894 (merge lab's dec7a3e regen_t2 PYTHON_BIN fix)
                                4 commits ahead of mllm-integrate
                                0 commits behind  (lab fully caught up)

HF mirror: explcre/dnathinker-checkpoints (last push 04:10 UTC)
  runs/exp_t1_grid_*_full/zs_{raw,enriched}/metrics.json
  runs/exp_t2_grid_*_full/zs_{raw,enriched}/metrics.json
  data/reasoning_traces/train.enhancer_generation.reasoning.jsonl  (live, 281 rows)
  data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl
  data/reasoning_traces/post_rft_{contract_fixture,smoke}.jsonl
  docs/{lab_message_v2, t3_metrics_quickref, t3_post_v5_followups, experiment_chain_v5_unified}.md
  results/h100_snapshot.md

12. Total ETA to headline

Step Wall-clock
T3 zs_raw + enriched bench ~10 h from now
Stage 0c oracle scoring on T3 zs preds ~30 min after bench
Stages 1/2/3 + score-adapter (T1, T2, T3 fusion-SFT + per-cell oracle) ~22 h
Stage 3b (T3 reasoning-only) ~3 h
Stage 3c (T3 multi-turn RFT + retrain) ~5 h
Stage 3d (T3 reasoning expansion 333 rows) ~30 min IO-bound (in parallel with Stage 4)
Stage 3f (T1 reasoning) continuous, +333/day
Stage 4 (joint multitask 105k) ~10 h
Stages 5+6 (NTv3-only baselines) ~4 h
Stage 7 (aggregator + final HF push) minutes

Total H100 post-bench: ~36 h. With lab cluster handling arch-mode + aligner + contrastive + TACO + HyenaDNA in parallel, the headline submission lands in ~3–4 days.

13. Critical gates (what's blocking)

Gate Blocker Unblocks
Galaxy T2 enhancer regen lab launches slurm/regen_t2_enriched_with_enhancer_scan.sh; ~8h CPU T2 bench rerun, T2 fusion-SFT, T2 reasoning expansion (Stage 3e), proper T2 row in Table 1
T3 RFT runs (Stage 3c) needs Stage 3 done T3 reasoning expansion (Stage 3d), post-RFT row in Table 1 T3 column
Reasoning accumulation (Tier 2) OpenRouter 1000/day cap per key "Reasoning model" rows for T1 (now), T2 (after regen), T3 (after RFT). Multi-key parallel = lab side.
TACO + HyenaDNA lab work external baseline rows in Table 1 β€” reviewers will ask