Upload PROJECT_RESUME_MANIFEST.md with huggingface_hub
Browse files- PROJECT_RESUME_MANIFEST.md +131 -0
PROJECT_RESUME_MANIFEST.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DNAThinker Project Resume Manifest (2026-05-07)
|
| 2 |
+
|
| 3 |
+
This document maps every artifact you need to **resume the DNAThinker
|
| 4 |
+
paper work on a fresh machine**. Everything except local secrets is
|
| 5 |
+
mirrored to public HuggingFace repos.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 1. Code (GitHub)
|
| 10 |
+
|
| 11 |
+
```bash
|
| 12 |
+
git clone git@github.com:explcre/biomodel_reasoning_calling_study2.git
|
| 13 |
+
git checkout mllm-integrate-server3 # active server3 branch
|
| 14 |
+
# or `mllm-integrate` for the lab-merged trunk
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
- Latest server3 commit at snapshot time: `bc3c9aa` (paper at 54 pages)
|
| 18 |
+
- Lab branch: `origin/mllm-integrate` (lab-cluster commits + my server3 merges)
|
| 19 |
+
|
| 20 |
+
## 2. Model checkpoints on HuggingFace
|
| 21 |
+
|
| 22 |
+
| Repo | What | Size |
|
| 23 |
+
|---|---|---|
|
| 24 |
+
| `explcre/dnathinker_t2_dual_xa` | T2 dual-XA Variant C (FT-100M+combined) | 992 MB |
|
| 25 |
+
| `explcre/dnathinker_t3_trunks` | T3 edit-tight 500-bp Path-A production trunk | 507 MB |
|
| 26 |
+
| `explcre/ntv3_rich_cond_mdlm_phase4_family` | T1 trunks (v3 paper-headline + v5full15m + others) | ~7 GB |
|
| 27 |
+
| `explcre/phase7_multitask` | Joint MoE LoRAs (T1+T2+T3 r=64) | ~150 MB |
|
| 28 |
+
| `explcre/phase8_rl` | All Phase-8 RL ckpts (incl.\ multi-seed reasoning RL, FIXED grid, KD/BIGLR) | ~5 GB |
|
| 29 |
+
| `explcre/phase5_stage_a` / `phase5_stage_b` | Phase-5 SFT bases | varies |
|
| 30 |
+
| `explcre/phase6_grpo` | Phase-6 GRPO RL | varies |
|
| 31 |
+
|
| 32 |
+
The HF auto-uploader (`scripts/innovations/hf_auto_uploader.py`) keeps
|
| 33 |
+
all of the above mirrored every 30 min.
|
| 34 |
+
|
| 35 |
+
## 3. Paper-grade results (under `explcre/phase8_rl/_paper_results/`)
|
| 36 |
+
|
| 37 |
+
- `tab_t1_v7predictor_n4000_bootstrap_ci.md` — FID/embed-cos CI on n=4000
|
| 38 |
+
- `tab_t1_production_n200_bootstrap_ci.md` — FID 2.87 [1.65, 4.82] on n=200
|
| 39 |
+
- `tab_rqrl_t1_bootstrap_ci.md` — TFG 0.4384 [0.351, 0.527] B=5000
|
| 40 |
+
- `reasoning_rl_multiseed_summary.{md,json}` — T1/T2/T3 multi-seed aggregator
|
| 41 |
+
- `cycle_70zz44_*` — production-route MoE end-to-end eval evidence
|
| 42 |
+
- `cycle_70zz33_lraxis_bootstrap_ci.md` — T3 RL lr-axis bootstrap
|
| 43 |
+
|
| 44 |
+
## 4. Multi-seed reasoning RL (under `explcre/phase8_rl/_reasoning_rl_multiseed/`)
|
| 45 |
+
|
| 46 |
+
Each `exp_phase8_reasoning_grounded_rl_<task>_r128_alpha1_s<seed>_*`
|
| 47 |
+
directory has `best.pt`, `log.jsonl`, `manifest.json`, and the
|
| 48 |
+
matching `eval_reasoning_<task>_v7r128_postRL_alpha1_s<seed>_*/`
|
| 49 |
+
has `score.json` + `score.md`.
|
| 50 |
+
|
| 51 |
+
Tasks/seeds covered: T1 s=2,3 (cycle 70zz46); T3 s=2,3 (cycle 70zz48 parallel);
|
| 52 |
+
T2 s=2 (cycle 70zz47). T2 s=3 was in flight at snapshot time.
|
| 53 |
+
|
| 54 |
+
## 5. Claude memory (under `explcre/phase8_rl/_claude_memory/`)
|
| 55 |
+
|
| 56 |
+
8 files covering user role, feedback rules, project context. Place
|
| 57 |
+
under `~/.claude/projects/-workspace/memory/` on the new machine.
|
| 58 |
+
|
| 59 |
+
## 6. Lab-cluster artifacts (lab side, under `explcre/phase7_multitask`)
|
| 60 |
+
|
| 61 |
+
The lab cluster (3090×6, A6000×8, H100×4) ran SLURM jobs for the
|
| 62 |
+
FULL_AUDIT 4-algo×3-seed grid + L7 edit-tight on 650M trunk.
|
| 63 |
+
Outputs already merged into `mllm-integrate` and figures committed
|
| 64 |
+
to `paper/figures/`.
|
| 65 |
+
|
| 66 |
+
## 7. Public reference data (NOT in our HF; download fresh)
|
| 67 |
+
|
| 68 |
+
- PsychENCODE source: `https://psychencode.synapse.org/`
|
| 69 |
+
- NTv3 100M-post / 650M snapshots: `https://huggingface.co/InstaDeep/NTv3-...`
|
| 70 |
+
- Qwen3.5-0.8B base: `https://huggingface.co/Qwen/Qwen2.5-0.5B` (or local)
|
| 71 |
+
- JASPAR motif PWMs: `https://jaspar.genereg.net/`
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## 🔒 What you must save LOCALLY (do NOT upload)
|
| 76 |
+
|
| 77 |
+
Save these to your local machine — they are credentials and not
|
| 78 |
+
publishable:
|
| 79 |
+
|
| 80 |
+
| File | Where | Why |
|
| 81 |
+
|---|---|---|
|
| 82 |
+
| `/workspace/dnathinker/.env` | rsync to `~/dnathinker.env.bak` | OPENROUTER_API_KEY_{1..N} for reasoning expansion |
|
| 83 |
+
| `~/.huggingface/token` (or `HF_TOKEN` env) | already on your machine | HF push permissions |
|
| 84 |
+
| `~/.ssh/id_*` | already on your machine | GitHub push |
|
| 85 |
+
| `~/.netrc` (if it exists) | already on your machine | git auth fallback |
|
| 86 |
+
| `~/.kaggle/` (if used) | already on your machine | Kaggle data |
|
| 87 |
+
| Any `*.env` under `/workspace/biomodel_reasoning_calling_study2/` | rsync | task-local secrets |
|
| 88 |
+
|
| 89 |
+
```bash
|
| 90 |
+
# Suggested local backup commands
|
| 91 |
+
rsync -av root@<server3-host>:/workspace/dnathinker/.env ~/dnathinker.env.bak
|
| 92 |
+
# (HF token already in ~/.huggingface/ on your local machine)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## How to resume work on a fresh machine
|
| 98 |
+
|
| 99 |
+
```bash
|
| 100 |
+
# 1. Clone repo
|
| 101 |
+
git clone git@github.com:explcre/biomodel_reasoning_calling_study2.git
|
| 102 |
+
cd biomodel_reasoning_calling_study2/regureasoner_loop
|
| 103 |
+
git checkout mllm-integrate-server3
|
| 104 |
+
|
| 105 |
+
# 2. Install dependencies
|
| 106 |
+
pip install -r requirements.txt # if exists, else use existing env
|
| 107 |
+
# Required: torch, transformers, peft, huggingface_hub, datasets,
|
| 108 |
+
# numpy, pandas, matplotlib, pyyaml, etc.
|
| 109 |
+
|
| 110 |
+
# 3. Restore Claude memory (if using Claude Code)
|
| 111 |
+
mkdir -p ~/.claude/projects/-workspace/memory/
|
| 112 |
+
huggingface-cli download explcre/phase8_rl _claude_memory \
|
| 113 |
+
--local-dir ~/.claude/projects/-workspace/memory/ --include "_claude_memory/*"
|
| 114 |
+
mv ~/.claude/projects/-workspace/memory/_claude_memory/* \
|
| 115 |
+
~/.claude/projects/-workspace/memory/
|
| 116 |
+
|
| 117 |
+
# 4. Restore .env (from your local backup)
|
| 118 |
+
cp ~/dnathinker.env.bak /workspace/dnathinker/.env
|
| 119 |
+
|
| 120 |
+
# 5. Pull critical model ckpts as needed
|
| 121 |
+
# T3 edit-tight production trunk:
|
| 122 |
+
mkdir -p /workspace/dnathinker/runs/
|
| 123 |
+
huggingface-cli download explcre/dnathinker_t3_trunks \
|
| 124 |
+
exp_t3_edit_tight_20260505 --local-dir /workspace/dnathinker/runs/exp_t3_edit_tight_20260505
|
| 125 |
+
|
| 126 |
+
# 6. Compile paper to verify
|
| 127 |
+
cd paper && pdflatex main.tex && bibtex main && pdflatex main.tex && pdflatex main.tex
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
Done. Everything paper-grade is mirrored; nothing project-essential
|
| 131 |
+
is local-only.
|