explcre commited on
Commit
9dc753a
Β·
verified Β·
1 Parent(s): e137ff9

Upload docs/lab_message_2026_04_27_v2.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. docs/lab_message_2026_04_27_v2.md +222 -0
docs/lab_message_2026_04_27_v2.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Note to lab β€” H100-side update v2, 2026-04-27 ~04:00 UTC
2
+
3
+ ## Branch state
4
+
5
+ * `mllm-integrate-server2` is **13 commits ahead** of
6
+ `mllm-integrate` since the last merge (commit `43682fe`).
7
+ * Lab's `mllm-integrate` HEAD has not advanced since `43682fe` (the
8
+ previous merge into main); please pull / merge to pick up the v5 work.
9
+
10
+ ```
11
+ e133cf1 SV-GSPO T3 reward fix + post-v5 follow-ups
12
+ ffb0c5f T3 v5 propagated to paper_outline + minimal_publishable_suite
13
+ 25504fd T3 multi-turn rejection sampling + clear metrics quickref
14
+ 3e65c96 T3 solid: post-RFT JSONL β†’ reasoning expansion handoff
15
+ bb6704e Global input sanitiser (label leaks, proxy scores, cell-type expand)
16
+ 179903c Reasoning-trace generator (OpenRouter Ling-2.6-1T)
17
+ 945dc55 adapter→eval bridge: predict_fusion.py + post-bench wiring
18
+ 183e645 T3 RFT (rejection fine-tuning) β€” Stage B
19
+ 46e29d7 H100 results snapshot @ 01:50 UTC
20
+ 4b03b42 T3 reasoning-only SFT (mask_assistant_dna_span)
21
+ b5c9a86 docs: T3 evaluation design + PWM supplementary
22
+ af44fa4 T3 oracle-based eval (objective satisfaction)
23
+ b2a32be h100_progress: plan v4-final
24
+ ```
25
+
26
+ To pull on lab cluster:
27
+
28
+ ```bash
29
+ git fetch origin mllm-integrate-server2
30
+ git merge origin/mllm-integrate-server2 -m "merge v5 from H100"
31
+ # or if you want a clean history:
32
+ git rebase origin/mllm-integrate-server2
33
+ ```
34
+
35
+ ## Action items, ordered by urgency
36
+
37
+ ### 1. SV-GSPO outcome reward β€” pull before next RL run (CRITICAL)
38
+
39
+ `regureasoner/rl/reward_shaper.py:outcome_enhancer_editing` was
40
+ **training the agent on the wrong T3 objective** (edit-distance
41
+ window in `[1, 60]`). Under v5, the headline T3 metric is *objective
42
+ satisfaction* (`within_budget` AND `length_preserved` AND
43
+ `target_motif_present`) β€” see `docs/t3_metrics_quickref.md`.
44
+
45
+ Fixed in commit `e133cf1`. New reward = average of three binary
46
+ checks aligned with `eval_t3_oracle.py`. **Any in-flight or
47
+ upcoming SV-GSPO run on T3 must be on a checkout that includes this
48
+ commit** β€” otherwise the headline number suffers.
49
+
50
+ If you have a T3 SV-GSPO run currently queued or training: stop it,
51
+ rebase, restart. 56/56 unit tests pass on the new reward.
52
+
53
+ ### 2. T2 enriched dataset regen β€” needs galaxy CPU (BLOCKER on T2 quality)
54
+
55
+ The current prod T2 enriched JSONL only has TFBS scan on the
56
+ **promoter**; the enhancer side gets only a GC% line. That defeats
57
+ T2's premise β€” the model can't reason about shared TFBS hits if only
58
+ one side is scanned.
59
+
60
+ Fix exists in `tools/pe_grounding_tools.py:_template_tfbs_sequence_names_for_example`
61
+ which already returns `["input_promoter", "input_enhancer"]` for
62
+ T2 β€” it just wasn't run on the prod data.
63
+
64
+ Launcher committed at
65
+ `regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh`.
66
+ Drives the parent's `PEDatasetReasoningPipeline` in
67
+ `template_tools` mode (no LLM, disk-cached fimo).
68
+
69
+ H100 can't run this (raw CSVs + compiled FIMO live on lab cluster
70
+ only). Suggested galaxy invocation (CPU-rich, ~8 cores):
71
+
72
+ ```bash
73
+ cd /home/pengchx3/text-dna/biomodel_reasoning_calling_study2
74
+ git checkout origin/mllm-integrate-server2
75
+ for i in $(seq 0 7); do
76
+ SHARD_INDEX=$i NUM_SHARDS=8 \
77
+ bash regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh &
78
+ done
79
+ wait
80
+
81
+ # Output:
82
+ # /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl/{train,test}.pair_prediction.jsonl
83
+ # Push to HF when done so H100 can pick it up:
84
+ python regureasoner_loop/scripts/sync_checkpoints.py \
85
+ --src /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl \
86
+ --dest data/prod_full_test_v2_enhancer_scan/jsonl \
87
+ --repo-id explcre/dnathinker-checkpoints
88
+ ```
89
+
90
+ After that lands on HF, the H100 will rebench T2 with proper enhancer
91
+ TFBS context. ETA on galaxy: 8h sharded (~744k rows / 8 shards Γ— 30s
92
+ per row average). Cached on second pass.
93
+
94
+ ### 3. T3 RFT-from-joint ablation β€” extra Table 3 row (NICE-TO-HAVE)
95
+
96
+ The current pipeline runs T3 RFT against the Stage-3 (T3-only)
97
+ adapter. A worthwhile ablation: run RFT against the Stage-4 joint
98
+ adapter β€” does the joint-trained generator produce candidates with
99
+ higher mean objective margin, or do format artefacts dominate? One
100
+ flag change:
101
+
102
+ ```bash
103
+ STAGE_4=runs/exp_joint_multitask_${STAMP}/final/pytorch_model.bin
104
+ python regureasoner_loop/scripts/rft_t3.py \
105
+ --adapter-state-dict $STAGE_4 \
106
+ --train-jsonl data/prod_samples/train.enhancer_editing.strat7c.n35k.jsonl \
107
+ --oracle-path runs/exp_oracle_ds_7cell_min/oracle.pt \
108
+ --output-jsonl runs/exp_t3_rft_from_joint_${STAMP}/rft_filtered_train.jsonl \
109
+ --candidates 4 --rounds 4 --temp-ramp 0.15
110
+ # Re-train T3 fusion-SFT on the result for the ablation row.
111
+ ```
112
+
113
+ Cost: ~6h serial after Stage 4 (joint multitask) finishes on H100.
114
+ Lab has spare GPU? This is yours.
115
+
116
+ Detail in `docs/t3_post_v5_followups.md` Β§1.
117
+
118
+ ### 4. Loop-SFT for T3 β€” swap data source (NICE-TO-HAVE)
119
+
120
+ No code change. The T3 trajectory dataset for Loop-SFT should source
121
+ from the post-RFT JSONL (oracle-validated candidates) instead of the
122
+ heuristic gold:
123
+
124
+ ```bash
125
+ python regureasoner_loop/scripts/expand_loop_trajectories.py \
126
+ --source runs/exp_t3_fusion_sft_${STAMP}/rft_filtered_train.jsonl \
127
+ --out data/trajectories/train.enhancer_editing.rft.jsonl
128
+ TASK=enhancer_editing \
129
+ TRAIN_JSONL=data/trajectories/train.enhancer_editing.rft.jsonl \
130
+ ... \
131
+ bash regureasoner_loop/slurm/run_train_loop_sft.sh
132
+ ```
133
+
134
+ Lab side, since H100 doesn't have the OpenRouter throughput for
135
+ trajectory expansion at the 35k-row scale (free tier is 1000/day per
136
+ key β€” fine for 333-row reasoning ablations, not for full Loop-SFT
137
+ data).
138
+
139
+ ### 5. External baselines for paper headline β€” TACO + HyenaDNA (CRITICAL)
140
+
141
+ The paper currently has only internal baselines (zero-shot LLM,
142
+ fusion-SFT, our NTv3-direct). Reviewers will ask "where's the SOTA
143
+ comparison?". Two must-add baselines:
144
+
145
+ * **TACO** (Lin et al. NeurIPS 2024) β€” T3 paper precedent. Their repo
146
+ is public; drop in our DeepSTARR-7cell oracle, run their trainer on
147
+ our T3 train split, eval with `eval_t3_oracle.py`. ~1 day.
148
+ * **HyenaDNA** (Nguyen et al. NeurIPS 2023) β€” T2 fluency baseline.
149
+ Already wired as encoder in our stack; needs head training only.
150
+ ~1 day.
151
+
152
+ Lab side because both need cluster GPUs.
153
+
154
+ Detail + concrete recipes in `docs/t3_post_v5_followups.md` Β§5.
155
+
156
+ ### 6. Pull from HF β€” new artifacts available
157
+
158
+ H100 just pushed (4:00 UTC):
159
+
160
+ ```
161
+ data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (in flight, ~62 rows so far, target 333)
162
+ data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl (per-task quality samples for inspection)
163
+ data/reasoning_traces/post_rft_contract_fixture.jsonl (synthetic post-RFT row used in unit test)
164
+ data/reasoning_traces/post_rft_smoke.jsonl (real OpenRouter rationale on synthetic post-RFT input)
165
+ ```
166
+
167
+ Repo: `explcre/dnathinker-checkpoints`. Inspect the smoke files to
168
+ verify rationale quality + sanitiser correctness before wider rollout.
169
+
170
+ ### 7. Reasoning-trace daily-loop β€” coordinate API keys
171
+
172
+ The 1000 req/day OpenRouter free-tier cap means **one key drives
173
+ ~333 rows/task/day**. With the user's primary key on H100 we'll
174
+ build T1/T2/T3 reasoning at ~1k rows/day combined.
175
+
176
+ If you have spare OpenRouter accounts, run:
177
+
178
+ ```bash
179
+ OPENROUTER_API_KEY=<lab key> bash regureasoner_loop/slurm/build_reasoning_traces_loop.sh --daemon
180
+ ```
181
+
182
+ on a CPU box (zero GPU). Each shard is a separate run; the script
183
+ auto-resumes by id, so multiple boxes running with different keys
184
+ won't overlap if they share an output JSONL.
185
+
186
+ ## What's running on H100 right now
187
+
188
+ ```
189
+ PID 121129 vLLM bench T2 zs_enriched (full 744k, ~3.5h in, ETA ~30 min)
190
+ queued: T3 zs_raw, T3 zs_enriched (~5h each)
191
+ PID 137805 build_reasoning_traces.py T1 333-sample run (62/333 at 04:00 UTC)
192
+ PID 100544 watcher β†’ post_bench_pipeline.sh (idle until orchestrator exits)
193
+ ```
194
+
195
+ ETA full chain: ~36h after bench grid finishes.
196
+
197
+ ## Pipeline state β€” no jobs need killing
198
+
199
+ * Bench grid: vLLM zero-shot inference; T3 zs eval reads only metadata
200
+ (target_motif, edit_budget) β€” no v5-framework leakage. Safe.
201
+ * No fusion-SFT or RL job currently training; Stages 1–4 fire only
202
+ after bench grid completes, at which point they pick up multi-turn
203
+ RFT (commit `25504fd`) + Stage 3d post-RFT reasoning (commit
204
+ `3e65c96`) automatically.
205
+
206
+ ## Suggested coordination
207
+
208
+ Lab actions, in priority order:
209
+
210
+ 1. **Pull `mllm-integrate-server2`** (or merge into `mllm-integrate`).
211
+ 2. **Stop any in-flight T3 SV-GSPO run** if it predates `e133cf1` β€”
212
+ the reward function was wrong; restart with the new commit.
213
+ 3. **Galaxy: T2 enhancer-scan regen** (background, ~8h sharded) β€”
214
+ blocks the headline T2 numbers.
215
+ 4. **TACO + HyenaDNA baselines** in parallel.
216
+ 5. **RFT-from-joint ablation** + **Loop-SFT-on-RFT** as second-tier
217
+ ablations once Stage 4 lands.
218
+
219
+ Reach out on the shared channel if any of these conflict with
220
+ in-flight work.
221
+
222
+ β€” H100 side