explcre
/

dnathinker-checkpoints

Model card Files Files and versions

xet

Community

explcre commited on 20 days ago

Commit

9dc753a

verified ·

1 Parent(s): e137ff9

Upload docs/lab_message_2026_04_27_v2.md with huggingface_hub

Browse files

Files changed (1) hide show

docs/lab_message_2026_04_27_v2.md +222 -0

docs/lab_message_2026_04_27_v2.md ADDED Viewed

	@@ -0,0 +1,222 @@

+# Note to lab — H100-side update v2, 2026-04-27 ~04:00 UTC
+## Branch state
+* `mllm-integrate-server2` is **13 commits ahead** of
+  `mllm-integrate` since the last merge (commit `43682fe`).
+* Lab's `mllm-integrate` HEAD has not advanced since `43682fe` (the
+  previous merge into main); please pull / merge to pick up the v5 work.
+```
+e133cf1  SV-GSPO T3 reward fix + post-v5 follow-ups
+ffb0c5f  T3 v5 propagated to paper_outline + minimal_publishable_suite
+25504fd  T3 multi-turn rejection sampling + clear metrics quickref
+3e65c96  T3 solid: post-RFT JSONL → reasoning expansion handoff
+bb6704e  Global input sanitiser (label leaks, proxy scores, cell-type expand)
+179903c  Reasoning-trace generator (OpenRouter Ling-2.6-1T)
+945dc55  adapter→eval bridge: predict_fusion.py + post-bench wiring
+183e645  T3 RFT (rejection fine-tuning) — Stage B
+46e29d7  H100 results snapshot @ 01:50 UTC
+4b03b42  T3 reasoning-only SFT (mask_assistant_dna_span)
+b5c9a86  docs: T3 evaluation design + PWM supplementary
+af44fa4  T3 oracle-based eval (objective satisfaction)
+b2a32be  h100_progress: plan v4-final
+```
+To pull on lab cluster:
+```bash
+git fetch origin mllm-integrate-server2
+git merge origin/mllm-integrate-server2 -m "merge v5 from H100"
+# or if you want a clean history:
+git rebase origin/mllm-integrate-server2
+```
+## Action items, ordered by urgency
+### 1. SV-GSPO outcome reward — pull before next RL run (CRITICAL)
+`regureasoner/rl/reward_shaper.py:outcome_enhancer_editing` was
+**training the agent on the wrong T3 objective** (edit-distance
+window in `[1, 60]`). Under v5, the headline T3 metric is *objective
+satisfaction* (`within_budget` AND `length_preserved` AND
+`target_motif_present`) — see `docs/t3_metrics_quickref.md`.
+Fixed in commit `e133cf1`. New reward = average of three binary
+checks aligned with `eval_t3_oracle.py`. **Any in-flight or
+upcoming SV-GSPO run on T3 must be on a checkout that includes this
+commit** — otherwise the headline number suffers.
+If you have a T3 SV-GSPO run currently queued or training: stop it,
+rebase, restart. 56/56 unit tests pass on the new reward.
+### 2. T2 enriched dataset regen — needs galaxy CPU (BLOCKER on T2 quality)
+The current prod T2 enriched JSONL only has TFBS scan on the
+**promoter**; the enhancer side gets only a GC% line. That defeats
+T2's premise — the model can't reason about shared TFBS hits if only
+one side is scanned.
+Fix exists in `tools/pe_grounding_tools.py:_template_tfbs_sequence_names_for_example`
+which already returns `["input_promoter", "input_enhancer"]` for
+T2 — it just wasn't run on the prod data.
+Launcher committed at
+`regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh`.
+Drives the parent's `PEDatasetReasoningPipeline` in
+`template_tools` mode (no LLM, disk-cached fimo).
+H100 can't run this (raw CSVs + compiled FIMO live on lab cluster
+only). Suggested galaxy invocation (CPU-rich, ~8 cores):
+```bash
+cd /home/pengchx3/text-dna/biomodel_reasoning_calling_study2
+git checkout origin/mllm-integrate-server2
+for i in $(seq 0 7); do
+  SHARD_INDEX=$i NUM_SHARDS=8 \
+    bash regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh &
+done
+wait
+# Output:
+#   /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl/{train,test}.pair_prediction.jsonl
+# Push to HF when done so H100 can pick it up:
+python regureasoner_loop/scripts/sync_checkpoints.py \
+    --src /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl \
+    --dest data/prod_full_test_v2_enhancer_scan/jsonl \
+    --repo-id explcre/dnathinker-checkpoints
+```
+After that lands on HF, the H100 will rebench T2 with proper enhancer
+TFBS context. ETA on galaxy: 8h sharded (~744k rows / 8 shards × 30s
+per row average). Cached on second pass.
+### 3. T3 RFT-from-joint ablation — extra Table 3 row (NICE-TO-HAVE)
+The current pipeline runs T3 RFT against the Stage-3 (T3-only)
+adapter. A worthwhile ablation: run RFT against the Stage-4 joint
+adapter — does the joint-trained generator produce candidates with
+higher mean objective margin, or do format artefacts dominate? One
+flag change:
+```bash
+STAGE_4=runs/exp_joint_multitask_${STAMP}/final/pytorch_model.bin
+python regureasoner_loop/scripts/rft_t3.py \
+    --adapter-state-dict $STAGE_4 \
+    --train-jsonl data/prod_samples/train.enhancer_editing.strat7c.n35k.jsonl \
+    --oracle-path runs/exp_oracle_ds_7cell_min/oracle.pt \
+    --output-jsonl runs/exp_t3_rft_from_joint_${STAMP}/rft_filtered_train.jsonl \
+    --candidates 4 --rounds 4 --temp-ramp 0.15
+# Re-train T3 fusion-SFT on the result for the ablation row.
+```
+Cost: ~6h serial after Stage 4 (joint multitask) finishes on H100.
+Lab has spare GPU? This is yours.
+Detail in `docs/t3_post_v5_followups.md` §1.
+### 4. Loop-SFT for T3 — swap data source (NICE-TO-HAVE)
+No code change. The T3 trajectory dataset for Loop-SFT should source
+from the post-RFT JSONL (oracle-validated candidates) instead of the
+heuristic gold:
+```bash
+python regureasoner_loop/scripts/expand_loop_trajectories.py \
+    --source runs/exp_t3_fusion_sft_${STAMP}/rft_filtered_train.jsonl \
+    --out    data/trajectories/train.enhancer_editing.rft.jsonl
+TASK=enhancer_editing \
+TRAIN_JSONL=data/trajectories/train.enhancer_editing.rft.jsonl \
+... \
+bash regureasoner_loop/slurm/run_train_loop_sft.sh
+```
+Lab side, since H100 doesn't have the OpenRouter throughput for
+trajectory expansion at the 35k-row scale (free tier is 1000/day per
+key — fine for 333-row reasoning ablations, not for full Loop-SFT
+data).
+### 5. External baselines for paper headline — TACO + HyenaDNA (CRITICAL)
+The paper currently has only internal baselines (zero-shot LLM,
+fusion-SFT, our NTv3-direct). Reviewers will ask "where's the SOTA
+comparison?". Two must-add baselines:
+* **TACO** (Lin et al. NeurIPS 2024) — T3 paper precedent. Their repo
+  is public; drop in our DeepSTARR-7cell oracle, run their trainer on
+  our T3 train split, eval with `eval_t3_oracle.py`. ~1 day.
+* **HyenaDNA** (Nguyen et al. NeurIPS 2023) — T2 fluency baseline.
+  Already wired as encoder in our stack; needs head training only.
+  ~1 day.
+Lab side because both need cluster GPUs.
+Detail + concrete recipes in `docs/t3_post_v5_followups.md` §5.
+### 6. Pull from HF — new artifacts available
+H100 just pushed (4:00 UTC):
+```
+data/reasoning_traces/train.enhancer_generation.reasoning.jsonl   (in flight, ~62 rows so far, target 333)
+data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl   (per-task quality samples for inspection)
+data/reasoning_traces/post_rft_contract_fixture.jsonl              (synthetic post-RFT row used in unit test)
+data/reasoning_traces/post_rft_smoke.jsonl                          (real OpenRouter rationale on synthetic post-RFT input)
+```
+Repo: `explcre/dnathinker-checkpoints`. Inspect the smoke files to
+verify rationale quality + sanitiser correctness before wider rollout.
+### 7. Reasoning-trace daily-loop — coordinate API keys
+The 1000 req/day OpenRouter free-tier cap means **one key drives
+~333 rows/task/day**. With the user's primary key on H100 we'll
+build T1/T2/T3 reasoning at ~1k rows/day combined.
+If you have spare OpenRouter accounts, run:
+```bash
+OPENROUTER_API_KEY=<lab key> bash regureasoner_loop/slurm/build_reasoning_traces_loop.sh --daemon
+```
+on a CPU box (zero GPU). Each shard is a separate run; the script
+auto-resumes by id, so multiple boxes running with different keys
+won't overlap if they share an output JSONL.
+## What's running on H100 right now
+```
+PID 121129  vLLM bench T2 zs_enriched (full 744k, ~3.5h in, ETA ~30 min)
+            queued: T3 zs_raw, T3 zs_enriched (~5h each)
+PID 137805  build_reasoning_traces.py T1 333-sample run (62/333 at 04:00 UTC)
+PID 100544  watcher → post_bench_pipeline.sh (idle until orchestrator exits)
+```
+ETA full chain: ~36h after bench grid finishes.
+## Pipeline state — no jobs need killing
+* Bench grid: vLLM zero-shot inference; T3 zs eval reads only metadata
+  (target_motif, edit_budget) — no v5-framework leakage. Safe.
+* No fusion-SFT or RL job currently training; Stages 1–4 fire only
+  after bench grid completes, at which point they pick up multi-turn
+  RFT (commit `25504fd`) + Stage 3d post-RFT reasoning (commit
+  `3e65c96`) automatically.
+## Suggested coordination
+Lab actions, in priority order:
+1. **Pull `mllm-integrate-server2`** (or merge into `mllm-integrate`).
+2. **Stop any in-flight T3 SV-GSPO run** if it predates `e133cf1` —
+   the reward function was wrong; restart with the new commit.
+3. **Galaxy: T2 enhancer-scan regen** (background, ~8h sharded) —
+   blocks the headline T2 numbers.
+4. **TACO + HyenaDNA baselines** in parallel.
+5. **RFT-from-joint ablation** + **Loop-SFT-on-RFT** as second-tier
+   ablations once Stage 4 lands.
+Reach out on the shared channel if any of these conflict with
+in-flight work.
+— H100 side