File size: 8,875 Bytes
9dc753a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | # Note to lab β H100-side update v2, 2026-04-27 ~04:00 UTC
## Branch state
* `mllm-integrate-server2` is **13 commits ahead** of
`mllm-integrate` since the last merge (commit `43682fe`).
* Lab's `mllm-integrate` HEAD has not advanced since `43682fe` (the
previous merge into main); please pull / merge to pick up the v5 work.
```
e133cf1 SV-GSPO T3 reward fix + post-v5 follow-ups
ffb0c5f T3 v5 propagated to paper_outline + minimal_publishable_suite
25504fd T3 multi-turn rejection sampling + clear metrics quickref
3e65c96 T3 solid: post-RFT JSONL β reasoning expansion handoff
bb6704e Global input sanitiser (label leaks, proxy scores, cell-type expand)
179903c Reasoning-trace generator (OpenRouter Ling-2.6-1T)
945dc55 adapterβeval bridge: predict_fusion.py + post-bench wiring
183e645 T3 RFT (rejection fine-tuning) β Stage B
46e29d7 H100 results snapshot @ 01:50 UTC
4b03b42 T3 reasoning-only SFT (mask_assistant_dna_span)
b5c9a86 docs: T3 evaluation design + PWM supplementary
af44fa4 T3 oracle-based eval (objective satisfaction)
b2a32be h100_progress: plan v4-final
```
To pull on lab cluster:
```bash
git fetch origin mllm-integrate-server2
git merge origin/mllm-integrate-server2 -m "merge v5 from H100"
# or if you want a clean history:
git rebase origin/mllm-integrate-server2
```
## Action items, ordered by urgency
### 1. SV-GSPO outcome reward β pull before next RL run (CRITICAL)
`regureasoner/rl/reward_shaper.py:outcome_enhancer_editing` was
**training the agent on the wrong T3 objective** (edit-distance
window in `[1, 60]`). Under v5, the headline T3 metric is *objective
satisfaction* (`within_budget` AND `length_preserved` AND
`target_motif_present`) β see `docs/t3_metrics_quickref.md`.
Fixed in commit `e133cf1`. New reward = average of three binary
checks aligned with `eval_t3_oracle.py`. **Any in-flight or
upcoming SV-GSPO run on T3 must be on a checkout that includes this
commit** β otherwise the headline number suffers.
If you have a T3 SV-GSPO run currently queued or training: stop it,
rebase, restart. 56/56 unit tests pass on the new reward.
### 2. T2 enriched dataset regen β needs galaxy CPU (BLOCKER on T2 quality)
The current prod T2 enriched JSONL only has TFBS scan on the
**promoter**; the enhancer side gets only a GC% line. That defeats
T2's premise β the model can't reason about shared TFBS hits if only
one side is scanned.
Fix exists in `tools/pe_grounding_tools.py:_template_tfbs_sequence_names_for_example`
which already returns `["input_promoter", "input_enhancer"]` for
T2 β it just wasn't run on the prod data.
Launcher committed at
`regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh`.
Drives the parent's `PEDatasetReasoningPipeline` in
`template_tools` mode (no LLM, disk-cached fimo).
H100 can't run this (raw CSVs + compiled FIMO live on lab cluster
only). Suggested galaxy invocation (CPU-rich, ~8 cores):
```bash
cd /home/pengchx3/text-dna/biomodel_reasoning_calling_study2
git checkout origin/mllm-integrate-server2
for i in $(seq 0 7); do
SHARD_INDEX=$i NUM_SHARDS=8 \
bash regureasoner_loop/slurm/regen_t2_enriched_with_enhancer_scan.sh &
done
wait
# Output:
# /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl/{train,test}.pair_prediction.jsonl
# Push to HF when done so H100 can pick it up:
python regureasoner_loop/scripts/sync_checkpoints.py \
--src /dev/shm/dnathinker/data/t2_regen_enhancer_scan/jsonl \
--dest data/prod_full_test_v2_enhancer_scan/jsonl \
--repo-id explcre/dnathinker-checkpoints
```
After that lands on HF, the H100 will rebench T2 with proper enhancer
TFBS context. ETA on galaxy: 8h sharded (~744k rows / 8 shards Γ 30s
per row average). Cached on second pass.
### 3. T3 RFT-from-joint ablation β extra Table 3 row (NICE-TO-HAVE)
The current pipeline runs T3 RFT against the Stage-3 (T3-only)
adapter. A worthwhile ablation: run RFT against the Stage-4 joint
adapter β does the joint-trained generator produce candidates with
higher mean objective margin, or do format artefacts dominate? One
flag change:
```bash
STAGE_4=runs/exp_joint_multitask_${STAMP}/final/pytorch_model.bin
python regureasoner_loop/scripts/rft_t3.py \
--adapter-state-dict $STAGE_4 \
--train-jsonl data/prod_samples/train.enhancer_editing.strat7c.n35k.jsonl \
--oracle-path runs/exp_oracle_ds_7cell_min/oracle.pt \
--output-jsonl runs/exp_t3_rft_from_joint_${STAMP}/rft_filtered_train.jsonl \
--candidates 4 --rounds 4 --temp-ramp 0.15
# Re-train T3 fusion-SFT on the result for the ablation row.
```
Cost: ~6h serial after Stage 4 (joint multitask) finishes on H100.
Lab has spare GPU? This is yours.
Detail in `docs/t3_post_v5_followups.md` Β§1.
### 4. Loop-SFT for T3 β swap data source (NICE-TO-HAVE)
No code change. The T3 trajectory dataset for Loop-SFT should source
from the post-RFT JSONL (oracle-validated candidates) instead of the
heuristic gold:
```bash
python regureasoner_loop/scripts/expand_loop_trajectories.py \
--source runs/exp_t3_fusion_sft_${STAMP}/rft_filtered_train.jsonl \
--out data/trajectories/train.enhancer_editing.rft.jsonl
TASK=enhancer_editing \
TRAIN_JSONL=data/trajectories/train.enhancer_editing.rft.jsonl \
... \
bash regureasoner_loop/slurm/run_train_loop_sft.sh
```
Lab side, since H100 doesn't have the OpenRouter throughput for
trajectory expansion at the 35k-row scale (free tier is 1000/day per
key β fine for 333-row reasoning ablations, not for full Loop-SFT
data).
### 5. External baselines for paper headline β TACO + HyenaDNA (CRITICAL)
The paper currently has only internal baselines (zero-shot LLM,
fusion-SFT, our NTv3-direct). Reviewers will ask "where's the SOTA
comparison?". Two must-add baselines:
* **TACO** (Lin et al. NeurIPS 2024) β T3 paper precedent. Their repo
is public; drop in our DeepSTARR-7cell oracle, run their trainer on
our T3 train split, eval with `eval_t3_oracle.py`. ~1 day.
* **HyenaDNA** (Nguyen et al. NeurIPS 2023) β T2 fluency baseline.
Already wired as encoder in our stack; needs head training only.
~1 day.
Lab side because both need cluster GPUs.
Detail + concrete recipes in `docs/t3_post_v5_followups.md` Β§5.
### 6. Pull from HF β new artifacts available
H100 just pushed (4:00 UTC):
```
data/reasoning_traces/train.enhancer_generation.reasoning.jsonl (in flight, ~62 rows so far, target 333)
data/reasoning_traces/smoke_5rows_{t1,t2,t3}_postsanitize.jsonl (per-task quality samples for inspection)
data/reasoning_traces/post_rft_contract_fixture.jsonl (synthetic post-RFT row used in unit test)
data/reasoning_traces/post_rft_smoke.jsonl (real OpenRouter rationale on synthetic post-RFT input)
```
Repo: `explcre/dnathinker-checkpoints`. Inspect the smoke files to
verify rationale quality + sanitiser correctness before wider rollout.
### 7. Reasoning-trace daily-loop β coordinate API keys
The 1000 req/day OpenRouter free-tier cap means **one key drives
~333 rows/task/day**. With the user's primary key on H100 we'll
build T1/T2/T3 reasoning at ~1k rows/day combined.
If you have spare OpenRouter accounts, run:
```bash
OPENROUTER_API_KEY=<lab key> bash regureasoner_loop/slurm/build_reasoning_traces_loop.sh --daemon
```
on a CPU box (zero GPU). Each shard is a separate run; the script
auto-resumes by id, so multiple boxes running with different keys
won't overlap if they share an output JSONL.
## What's running on H100 right now
```
PID 121129 vLLM bench T2 zs_enriched (full 744k, ~3.5h in, ETA ~30 min)
queued: T3 zs_raw, T3 zs_enriched (~5h each)
PID 137805 build_reasoning_traces.py T1 333-sample run (62/333 at 04:00 UTC)
PID 100544 watcher β post_bench_pipeline.sh (idle until orchestrator exits)
```
ETA full chain: ~36h after bench grid finishes.
## Pipeline state β no jobs need killing
* Bench grid: vLLM zero-shot inference; T3 zs eval reads only metadata
(target_motif, edit_budget) β no v5-framework leakage. Safe.
* No fusion-SFT or RL job currently training; Stages 1β4 fire only
after bench grid completes, at which point they pick up multi-turn
RFT (commit `25504fd`) + Stage 3d post-RFT reasoning (commit
`3e65c96`) automatically.
## Suggested coordination
Lab actions, in priority order:
1. **Pull `mllm-integrate-server2`** (or merge into `mllm-integrate`).
2. **Stop any in-flight T3 SV-GSPO run** if it predates `e133cf1` β
the reward function was wrong; restart with the new commit.
3. **Galaxy: T2 enhancer-scan regen** (background, ~8h sharded) β
blocks the headline T2 numbers.
4. **TACO + HyenaDNA baselines** in parallel.
5. **RFT-from-joint ablation** + **Loop-SFT-on-RFT** as second-tier
ablations once Stage 4 lands.
Reach out on the shared channel if any of these conflict with
in-flight work.
β H100 side
|