Upload docs/lab_message_2026_04_27_v2.md with huggingface_hub
Browse files
docs/lab_message_2026_04_27_v2.md
ADDED
|
@@ -0,0 +1,222 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Note to lab β H100-side update v2, 2026-04-27 ~04:00 UTC
|
| 2 |
+
|
| 3 |
+
## Branch state
|
| 4 |
+
|
| 5 |
+
* `mllm-integrate-server2` is **13 commits ahead** of
|
| 6 |
+
`mllm-integrate` since the last merge (commit `43682fe`).
|
| 7 |
+
* Lab's `mllm-integrate` HEAD has not advanced since `43682fe` (the
|
| 8 |
+
previous merge into main); please pull / merge to pick up the v5 work.
|
| 9 |
+
|
| 10 |
+
```
|
| 11 |
+
e133cf1 SV-GSPO T3 reward fix + post-v5 follow-ups
|
| 12 |
+
ffb0c5f T3 v5 propagated to paper_outline + minimal_publishable_suite
|
| 13 |
+
25504fd T3 multi-turn rejection sampling + clear metrics quickref
|
| 14 |
+
3e65c96 T3 solid: post-RFT JSONL β reasoning expansion handoff
|
| 15 |
+
bb6704e Global input sanitiser (label leaks, proxy scores, cell-type expand)
|
| 16 |
+
179903c Reasoning-trace generator (OpenRouter Ling-2.6-1T)
|
| 17 |
+
945dc55 adapterβeval bridge: predict_fusion.py + post-bench wiring
|
| 18 |
+
183e645 T3 RFT (rejection fine-tuning) β Stage B
|
| 19 |
+
46e29d7 H100 results snapshot @ 01:50 UTC
|
| 20 |
+
4b03b42 T3 reasoning-only SFT (mask_assistant_dna_span)
|
| 21 |
+
b5c9a86 docs: T3 evaluation design + PWM supplementary
|
| 22 |
+
af44fa4 T3 oracle-based eval (objective satisfaction)
|
| 23 |
+
b2a32be h100_progress: plan v4-final
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
To pull on lab cluster:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
git fetch origin mllm-integrate-server2
|
| 30 |
+
git merge origin/mllm-integrate-server2 -m "merge v5 from H100"
|
| 31 |
+
# or if you want a clean history:
|
| 32 |
+
git rebase origin/mllm-integrate-server2
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## Action items, ordered by urgency
|
| 36 |
+
|
| 37 |
+
### 1. SV-GSPO outcome reward β pull before next RL run (CRITICAL)
|
| 38 |
+
|
| 39 |
+
`regureasoner/rl/reward_shaper.py:outcome_enhancer_editing` was
|
| 40 |
+
**training the agent on the wrong T3 objective** (edit-distance
|
| 41 |
+
window in `[1, 60]`). Under v5, the headline T3 metric is *objective
|
| 42 |
+
satisfaction* (`within_budget` AND `length_preserved` AND
|
| 43 |
+
`target_motif_present`) β see `docs/t3_metrics_quickref.md`.
|
| 44 |
+
|
| 45 |
+
Fixed in commit `e133cf1`. New reward = average of three binary
|
| 46 |
+
checks aligned with `eval_t3_oracle.py`. **Any in-flight or
|
| 47 |
+
upcoming SV-GSPO run on T3 must be on a checkout that includes this
|
| 48 |
+
commit** β otherwise the headline number suffers.
|
| 49 |
+
|
| 50 |
+
If you have a T3 SV-GSPO run currently queued or training: stop it,
|
| 51 |
+
rebase, restart. 56/56 unit tests pass on the new reward.
|
| 52 |
+
|
| 53 |
+
### 2. T2 enriched dataset regen β needs galaxy CPU (BLOCKER on T2 quality)
|
| 54 |
+
|
| 55 |
+
The current prod T2 enriched JSONL only has TFBS scan on the
|
| 56 |
+
**promoter**; the enhancer side gets only a GC% line. That defeats
|
| 57 |
+
T2's premise β the model can't reason about shared TFBS hits if only
|
| 58 |
+
one side is scanned.
|
| 59 |
+
|
| 60 |
+
Fix exists in `tools/pe_grounding_tools.py:_template_tfbs_sequence_names_for_example`
|
| 61 |
+
which already returns `["input_promoter", "input_enhancer"]` for
|
| 62 |
+
T2 β it just wasn't run on the prod data.
|
| 63 |
+
|
| 64 |
+
Launcher committed at
|
| 65 |
+
`regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh`.
|
| 66 |
+
Drives the parent's `PEDatasetReasoningPipeline` in
|
| 67 |
+
`template_tools` mode (no LLM, disk-cached fimo).
|
| 68 |
+
|
| 69 |
+
H100 can't run this (raw CSVs + compiled FIMO live on lab cluster
|
| 70 |
+
only). Suggested galaxy invocation (CPU-rich, ~8 cores):
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
cd /home/pengchx3/text-dna/biomodel_reasoning_calling_study2
|
| 74 |
+
git checkout origin/mllm-integrate-server2
|
| 75 |
+
for i in $(seq 0 7); do
|
| 76 |
+
SHARD_INDEX=$i NUM_SHARDS=8 \
|
| 77 |
+
bash regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh &
|
| 78 |
+
done
|
| 79 |
+
wait
|
| 80 |
+
|
| 81 |
+
# Output:
|
| 82 |
+
# /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl/{train,test}.pair_prediction.jsonl
|
| 83 |
+
# Push to HF when done so H100 can pick it up:
|
| 84 |
+
python regureasoner_loop/scripts/sync_checkpoints.py \
|
| 85 |
+
--src /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl \
|
| 86 |
+
--dest data/prod_full_test_v2_enhancer_scan/jsonl \
|
| 87 |
+
--repo-id explcre/dnathinker-checkpoints
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
After that lands on HF, the H100 will rebench T2 with proper enhancer
|
| 91 |
+
TFBS context. ETA on galaxy: 8h sharded (~744k rows / 8 shards Γ 30s
|
| 92 |
+
per row average). Cached on second pass.
|
| 93 |
+
|
| 94 |
+
### 3. T3 RFT-from-joint ablation β extra Table 3 row (NICE-TO-HAVE)
|
| 95 |
+
|
| 96 |
+
The current pipeline runs T3 RFT against the Stage-3 (T3-only)
|
| 97 |
+
adapter. A worthwhile ablation: run RFT against the Stage-4 joint
|
| 98 |
+
adapter β does the joint-trained generator produce candidates with
|
| 99 |
+
higher mean objective margin, or do format artefacts dominate? One
|
| 100 |
+
flag change:
|
| 101 |
+
|
| 102 |
+
```bash
|
| 103 |
+
STAGE_4=runs/exp_joint_multitask_${STAMP}/final/pytorch_model.bin
|
| 104 |
+
python regureasoner_loop/scripts/rft_t3.py \
|
| 105 |
+
--adapter-state-dict $STAGE_4 \
|
| 106 |
+
--train-jsonl data/prod_samples/train.enhancer_editing.strat7c.n35k.jsonl \
|
| 107 |
+
--oracle-path runs/exp_oracle_ds_7cell_min/oracle.pt \
|
| 108 |
+
--output-jsonl runs/exp_t3_rft_from_joint_${STAMP}/rft_filtered_train.jsonl \
|
| 109 |
+
--candidates 4 --rounds 4 --temp-ramp 0.15
|
| 110 |
+
# Re-train T3 fusion-SFT on the result for the ablation row.
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
Cost: ~6h serial after Stage 4 (joint multitask) finishes on H100.
|
| 114 |
+
Lab has spare GPU? This is yours.
|
| 115 |
+
|
| 116 |
+
Detail in `docs/t3_post_v5_followups.md` Β§1.
|
| 117 |
+
|
| 118 |
+
### 4. Loop-SFT for T3 β swap data source (NICE-TO-HAVE)
|
| 119 |
+
|
| 120 |
+
No code change. The T3 trajectory dataset for Loop-SFT should source
|
| 121 |
+
from the post-RFT JSONL (oracle-validated candidates) instead of the
|
| 122 |
+
heuristic gold:
|
| 123 |
+
|
| 124 |
+
```bash
|
| 125 |
+
python regureasoner_loop/scripts/expand_loop_trajectories.py \
|
| 126 |
+
--source runs/exp_t3_fusion_sft_${STAMP}/rft_filtered_train.jsonl \
|
| 127 |
+
--out data/trajectories/train.enhancer_editing.rft.jsonl
|
| 128 |
+
TASK=enhancer_editing \
|
| 129 |
+
TRAIN_JSONL=data/trajectories/train.enhancer_editing.rft.jsonl \
|
| 130 |
+
... \
|
| 131 |
+
bash regureasoner_loop/slurm/run_train_loop_sft.sh
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
Lab side, since H100 doesn't have the OpenRouter throughput for
|
| 135 |
+
trajectory expansion at the 35k-row scale (free tier is 1000/day per
|
| 136 |
+
key β fine for 333-row reasoning ablations, not for full Loop-SFT
|
| 137 |
+
data).
|
| 138 |
+
|
| 139 |
+
### 5. External baselines for paper headline β TACO + HyenaDNA (CRITICAL)
|
| 140 |
+
|
| 141 |
+
The paper currently has only internal baselines (zero-shot LLM,
|
| 142 |
+
fusion-SFT, our NTv3-direct). Reviewers will ask "where's the SOTA
|
| 143 |
+
comparison?". Two must-add baselines:
|
| 144 |
+
|
| 145 |
+
* **TACO** (Lin et al. NeurIPS 2024) β T3 paper precedent. Their repo
|
| 146 |
+
is public; drop in our DeepSTARR-7cell oracle, run their trainer on
|
| 147 |
+
our T3 train split, eval with `eval_t3_oracle.py`. ~1 day.
|
| 148 |
+
* **HyenaDNA** (Nguyen et al. NeurIPS 2023) β T2 fluency baseline.
|
| 149 |
+
Already wired as encoder in our stack; needs head training only.
|
| 150 |
+
~1 day.
|
| 151 |
+
|
| 152 |
+
Lab side because both need cluster GPUs.
|
| 153 |
+
|
| 154 |
+
Detail + concrete recipes in `docs/t3_post_v5_followups.md` Β§5.
|
| 155 |
+
|
| 156 |
+
### 6. Pull from HF β new artifacts available
|
| 157 |
+
|
| 158 |
+
H100 just pushed (4:00 UTC):
|
| 159 |
+
|
| 160 |
+
```
|
| 161 |
+
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (in flight, ~62 rows so far, target 333)
|
| 162 |
+
data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl (per-task quality samples for inspection)
|
| 163 |
+
data/reasoning_traces/post_rft_contract_fixture.jsonl (synthetic post-RFT row used in unit test)
|
| 164 |
+
data/reasoning_traces/post_rft_smoke.jsonl (real OpenRouter rationale on synthetic post-RFT input)
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
Repo: `explcre/dnathinker-checkpoints`. Inspect the smoke files to
|
| 168 |
+
verify rationale quality + sanitiser correctness before wider rollout.
|
| 169 |
+
|
| 170 |
+
### 7. Reasoning-trace daily-loop β coordinate API keys
|
| 171 |
+
|
| 172 |
+
The 1000 req/day OpenRouter free-tier cap means **one key drives
|
| 173 |
+
~333 rows/task/day**. With the user's primary key on H100 we'll
|
| 174 |
+
build T1/T2/T3 reasoning at ~1k rows/day combined.
|
| 175 |
+
|
| 176 |
+
If you have spare OpenRouter accounts, run:
|
| 177 |
+
|
| 178 |
+
```bash
|
| 179 |
+
OPENROUTER_API_KEY=<lab key> bash regureasoner_loop/slurm/build_reasoning_traces_loop.sh --daemon
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
on a CPU box (zero GPU). Each shard is a separate run; the script
|
| 183 |
+
auto-resumes by id, so multiple boxes running with different keys
|
| 184 |
+
won't overlap if they share an output JSONL.
|
| 185 |
+
|
| 186 |
+
## What's running on H100 right now
|
| 187 |
+
|
| 188 |
+
```
|
| 189 |
+
PID 121129 vLLM bench T2 zs_enriched (full 744k, ~3.5h in, ETA ~30 min)
|
| 190 |
+
queued: T3 zs_raw, T3 zs_enriched (~5h each)
|
| 191 |
+
PID 137805 build_reasoning_traces.py T1 333-sample run (62/333 at 04:00 UTC)
|
| 192 |
+
PID 100544 watcher β post_bench_pipeline.sh (idle until orchestrator exits)
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
ETA full chain: ~36h after bench grid finishes.
|
| 196 |
+
|
| 197 |
+
## Pipeline state β no jobs need killing
|
| 198 |
+
|
| 199 |
+
* Bench grid: vLLM zero-shot inference; T3 zs eval reads only metadata
|
| 200 |
+
(target_motif, edit_budget) β no v5-framework leakage. Safe.
|
| 201 |
+
* No fusion-SFT or RL job currently training; Stages 1β4 fire only
|
| 202 |
+
after bench grid completes, at which point they pick up multi-turn
|
| 203 |
+
RFT (commit `25504fd`) + Stage 3d post-RFT reasoning (commit
|
| 204 |
+
`3e65c96`) automatically.
|
| 205 |
+
|
| 206 |
+
## Suggested coordination
|
| 207 |
+
|
| 208 |
+
Lab actions, in priority order:
|
| 209 |
+
|
| 210 |
+
1. **Pull `mllm-integrate-server2`** (or merge into `mllm-integrate`).
|
| 211 |
+
2. **Stop any in-flight T3 SV-GSPO run** if it predates `e133cf1` β
|
| 212 |
+
the reward function was wrong; restart with the new commit.
|
| 213 |
+
3. **Galaxy: T2 enhancer-scan regen** (background, ~8h sharded) β
|
| 214 |
+
blocks the headline T2 numbers.
|
| 215 |
+
4. **TACO + HyenaDNA baselines** in parallel.
|
| 216 |
+
5. **RFT-from-joint ablation** + **Loop-SFT-on-RFT** as second-tier
|
| 217 |
+
ablations once Stage 4 lands.
|
| 218 |
+
|
| 219 |
+
Reach out on the shared channel if any of these conflict with
|
| 220 |
+
in-flight work.
|
| 221 |
+
|
| 222 |
+
β H100 side
|