| # Note to lab β H100-side update v2, 2026-04-27 ~04:00 UTC |
|
|
| ## Branch state |
|
|
| * `mllm-integrate-server2` is **13 commits ahead** of |
| `mllm-integrate` since the last merge (commit `43682fe`). |
| * Lab's `mllm-integrate` HEAD has not advanced since `43682fe` (the |
| previous merge into main); please pull / merge to pick up the v5 work. |
|
|
| ``` |
| e133cf1 SV-GSPO T3 reward fix + post-v5 follow-ups |
| ffb0c5f T3 v5 propagated to paper_outline + minimal_publishable_suite |
| 25504fd T3 multi-turn rejection sampling + clear metrics quickref |
| 3e65c96 T3 solid: post-RFT JSONL β reasoning expansion handoff |
| bb6704e Global input sanitiser (label leaks, proxy scores, cell-type expand) |
| 179903c Reasoning-trace generator (OpenRouter Ling-2.6-1T) |
| 945dc55 adapterβeval bridge: predict_fusion.py + post-bench wiring |
| 183e645 T3 RFT (rejection fine-tuning) β Stage B |
| 46e29d7 H100 results snapshot @ 01:50 UTC |
| 4b03b42 T3 reasoning-only SFT (mask_assistant_dna_span) |
| b5c9a86 docs: T3 evaluation design + PWM supplementary |
| af44fa4 T3 oracle-based eval (objective satisfaction) |
| b2a32be h100_progress: plan v4-final |
| ``` |
|
|
| To pull on lab cluster: |
|
|
| ```bash |
| git fetch origin mllm-integrate-server2 |
| git merge origin/mllm-integrate-server2 -m "merge v5 from H100" |
| # or if you want a clean history: |
| git rebase origin/mllm-integrate-server2 |
| ``` |
|
|
| ## Action items, ordered by urgency |
|
|
| ### 1. SV-GSPO outcome reward β pull before next RL run (CRITICAL) |
|
|
| `regureasoner/rl/reward_shaper.py:outcome_enhancer_editing` was |
| **training the agent on the wrong T3 objective** (edit-distance |
| window in `[1, 60]`). Under v5, the headline T3 metric is *objective |
| satisfaction* (`within_budget` AND `length_preserved` AND |
| `target_motif_present`) β see `docs/t3_metrics_quickref.md`. |
|
|
| Fixed in commit `e133cf1`. New reward = average of three binary |
| checks aligned with `eval_t3_oracle.py`. **Any in-flight or |
| upcoming SV-GSPO run on T3 must be on a checkout that includes this |
| commit** β otherwise the headline number suffers. |
|
|
| If you have a T3 SV-GSPO run currently queued or training: stop it, |
| rebase, restart. 56/56 unit tests pass on the new reward. |
|
|
| ### 2. T2 enriched dataset regen β needs galaxy CPU (BLOCKER on T2 quality) |
|
|
| The current prod T2 enriched JSONL only has TFBS scan on the |
| **promoter**; the enhancer side gets only a GC% line. That defeats |
| T2's premise β the model can't reason about shared TFBS hits if only |
| one side is scanned. |
|
|
| Fix exists in `tools/pe_grounding_tools.py:_template_tfbs_sequence_names_for_example` |
| which already returns `["input_promoter", "input_enhancer"]` for |
| T2 β it just wasn't run on the prod data. |
|
|
| Launcher committed at |
| `regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh`. |
| Drives the parent's `PEDatasetReasoningPipeline` in |
| `template_tools` mode (no LLM, disk-cached fimo). |
|
|
| H100 can't run this (raw CSVs + compiled FIMO live on lab cluster |
| only). Suggested galaxy invocation (CPU-rich, ~8 cores): |
|
|
| ```bash |
| cd /home/pengchx3/text-dna/biomodel_reasoning_calling_study2 |
| git checkout origin/mllm-integrate-server2 |
| for i in $(seq 0 7); do |
| SHARD_INDEX=$i NUM_SHARDS=8 \ |
| bash regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh & |
| done |
| wait |
| |
| # Output: |
| # /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl/{train,test}.pair_prediction.jsonl |
| # Push to HF when done so H100 can pick it up: |
| python regureasoner_loop/scripts/sync_checkpoints.py \ |
| --src /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl \ |
| --dest data/prod_full_test_v2_enhancer_scan/jsonl \ |
| --repo-id explcre/dnathinker-checkpoints |
| ``` |
|
|
| After that lands on HF, the H100 will rebench T2 with proper enhancer |
| TFBS context. ETA on galaxy: 8h sharded (~744k rows / 8 shards Γ 30s |
| per row average). Cached on second pass. |
|
|
| ### 3. T3 RFT-from-joint ablation β extra Table 3 row (NICE-TO-HAVE) |
|
|
| The current pipeline runs T3 RFT against the Stage-3 (T3-only) |
| adapter. A worthwhile ablation: run RFT against the Stage-4 joint |
| adapter β does the joint-trained generator produce candidates with |
| higher mean objective margin, or do format artefacts dominate? One |
| flag change: |
|
|
| ```bash |
| STAGE_4=runs/exp_joint_multitask_${STAMP}/final/pytorch_model.bin |
| python regureasoner_loop/scripts/rft_t3.py \ |
| --adapter-state-dict $STAGE_4 \ |
| --train-jsonl data/prod_samples/train.enhancer_editing.strat7c.n35k.jsonl \ |
| --oracle-path runs/exp_oracle_ds_7cell_min/oracle.pt \ |
| --output-jsonl runs/exp_t3_rft_from_joint_${STAMP}/rft_filtered_train.jsonl \ |
| --candidates 4 --rounds 4 --temp-ramp 0.15 |
| # Re-train T3 fusion-SFT on the result for the ablation row. |
| ``` |
|
|
| Cost: ~6h serial after Stage 4 (joint multitask) finishes on H100. |
| Lab has spare GPU? This is yours. |
|
|
| Detail in `docs/t3_post_v5_followups.md` Β§1. |
|
|
| ### 4. Loop-SFT for T3 β swap data source (NICE-TO-HAVE) |
|
|
| No code change. The T3 trajectory dataset for Loop-SFT should source |
| from the post-RFT JSONL (oracle-validated candidates) instead of the |
| heuristic gold: |
|
|
| ```bash |
| python regureasoner_loop/scripts/expand_loop_trajectories.py \ |
| --source runs/exp_t3_fusion_sft_${STAMP}/rft_filtered_train.jsonl \ |
| --out data/trajectories/train.enhancer_editing.rft.jsonl |
| TASK=enhancer_editing \ |
| TRAIN_JSONL=data/trajectories/train.enhancer_editing.rft.jsonl \ |
| ... \ |
| bash regureasoner_loop/slurm/run_train_loop_sft.sh |
| ``` |
|
|
| Lab side, since H100 doesn't have the OpenRouter throughput for |
| trajectory expansion at the 35k-row scale (free tier is 1000/day per |
| key β fine for 333-row reasoning ablations, not for full Loop-SFT |
| data). |
|
|
| ### 5. External baselines for paper headline β TACO + HyenaDNA (CRITICAL) |
|
|
| The paper currently has only internal baselines (zero-shot LLM, |
| fusion-SFT, our NTv3-direct). Reviewers will ask "where's the SOTA |
| comparison?". Two must-add baselines: |
|
|
| * **TACO** (Lin et al. NeurIPS 2024) β T3 paper precedent. Their repo |
| is public; drop in our DeepSTARR-7cell oracle, run their trainer on |
| our T3 train split, eval with `eval_t3_oracle.py`. ~1 day. |
| * **HyenaDNA** (Nguyen et al. NeurIPS 2023) β T2 fluency baseline. |
| Already wired as encoder in our stack; needs head training only. |
| ~1 day. |
|
|
| Lab side because both need cluster GPUs. |
|
|
| Detail + concrete recipes in `docs/t3_post_v5_followups.md` Β§5. |
|
|
| ### 6. Pull from HF β new artifacts available |
|
|
| H100 just pushed (4:00 UTC): |
|
|
| ``` |
| data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (in flight, ~62 rows so far, target 333) |
| data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl (per-task quality samples for inspection) |
| data/reasoning_traces/post_rft_contract_fixture.jsonl (synthetic post-RFT row used in unit test) |
| data/reasoning_traces/post_rft_smoke.jsonl (real OpenRouter rationale on synthetic post-RFT input) |
| ``` |
|
|
| Repo: `explcre/dnathinker-checkpoints`. Inspect the smoke files to |
| verify rationale quality + sanitiser correctness before wider rollout. |
|
|
| ### 7. Reasoning-trace daily-loop β coordinate API keys |
|
|
| The 1000 req/day OpenRouter free-tier cap means **one key drives |
| ~333 rows/task/day**. With the user's primary key on H100 we'll |
| build T1/T2/T3 reasoning at ~1k rows/day combined. |
|
|
| If you have spare OpenRouter accounts, run: |
|
|
| ```bash |
| OPENROUTER_API_KEY=<lab key> bash regureasoner_loop/slurm/build_reasoning_traces_loop.sh --daemon |
| ``` |
|
|
| on a CPU box (zero GPU). Each shard is a separate run; the script |
| auto-resumes by id, so multiple boxes running with different keys |
| won't overlap if they share an output JSONL. |
|
|
| ## What's running on H100 right now |
|
|
| ``` |
| PID 121129 vLLM bench T2 zs_enriched (full 744k, ~3.5h in, ETA ~30 min) |
| queued: T3 zs_raw, T3 zs_enriched (~5h each) |
| PID 137805 build_reasoning_traces.py T1 333-sample run (62/333 at 04:00 UTC) |
| PID 100544 watcher β post_bench_pipeline.sh (idle until orchestrator exits) |
| ``` |
|
|
| ETA full chain: ~36h after bench grid finishes. |
|
|
| ## Pipeline state β no jobs need killing |
|
|
| * Bench grid: vLLM zero-shot inference; T3 zs eval reads only metadata |
| (target_motif, edit_budget) β no v5-framework leakage. Safe. |
| * No fusion-SFT or RL job currently training; Stages 1β4 fire only |
| after bench grid completes, at which point they pick up multi-turn |
| RFT (commit `25504fd`) + Stage 3d post-RFT reasoning (commit |
| `3e65c96`) automatically. |
|
|
| ## Suggested coordination |
|
|
| Lab actions, in priority order: |
|
|
| 1. **Pull `mllm-integrate-server2`** (or merge into `mllm-integrate`). |
| 2. **Stop any in-flight T3 SV-GSPO run** if it predates `e133cf1` β |
| the reward function was wrong; restart with the new commit. |
| 3. **Galaxy: T2 enhancer-scan regen** (background, ~8h sharded) β |
| blocks the headline T2 numbers. |
| 4. **TACO + HyenaDNA baselines** in parallel. |
| 5. **RFT-from-joint ablation** + **Loop-SFT-on-RFT** as second-tier |
| ablations once Stage 4 lands. |
|
|
| Reach out on the shared channel if any of these conflict with |
| in-flight work. |
|
|
| β H100 side |
|
|