explcre commited on
Commit
d0f9069
·
verified ·
1 Parent(s): c1aabde

Upload PROJECT_RESUME_MANIFEST.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. PROJECT_RESUME_MANIFEST.md +131 -0
PROJECT_RESUME_MANIFEST.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DNAThinker Project Resume Manifest (2026-05-07)
2
+
3
+ This document maps every artifact you need to **resume the DNAThinker
4
+ paper work on a fresh machine**. Everything except local secrets is
5
+ mirrored to public HuggingFace repos.
6
+
7
+ ---
8
+
9
+ ## 1. Code (GitHub)
10
+
11
+ ```bash
12
+ git clone git@github.com:explcre/biomodel_reasoning_calling_study2.git
13
+ git checkout mllm-integrate-server3 # active server3 branch
14
+ # or `mllm-integrate` for the lab-merged trunk
15
+ ```
16
+
17
+ - Latest server3 commit at snapshot time: `bc3c9aa` (paper at 54 pages)
18
+ - Lab branch: `origin/mllm-integrate` (lab-cluster commits + my server3 merges)
19
+
20
+ ## 2. Model checkpoints on HuggingFace
21
+
22
+ | Repo | What | Size |
23
+ |---|---|---|
24
+ | `explcre/dnathinker_t2_dual_xa` | T2 dual-XA Variant C (FT-100M+combined) | 992 MB |
25
+ | `explcre/dnathinker_t3_trunks` | T3 edit-tight 500-bp Path-A production trunk | 507 MB |
26
+ | `explcre/ntv3_rich_cond_mdlm_phase4_family` | T1 trunks (v3 paper-headline + v5full15m + others) | ~7 GB |
27
+ | `explcre/phase7_multitask` | Joint MoE LoRAs (T1+T2+T3 r=64) | ~150 MB |
28
+ | `explcre/phase8_rl` | All Phase-8 RL ckpts (incl.\ multi-seed reasoning RL, FIXED grid, KD/BIGLR) | ~5 GB |
29
+ | `explcre/phase5_stage_a` / `phase5_stage_b` | Phase-5 SFT bases | varies |
30
+ | `explcre/phase6_grpo` | Phase-6 GRPO RL | varies |
31
+
32
+ The HF auto-uploader (`scripts/innovations/hf_auto_uploader.py`) keeps
33
+ all of the above mirrored every 30 min.
34
+
35
+ ## 3. Paper-grade results (under `explcre/phase8_rl/_paper_results/`)
36
+
37
+ - `tab_t1_v7predictor_n4000_bootstrap_ci.md` — FID/embed-cos CI on n=4000
38
+ - `tab_t1_production_n200_bootstrap_ci.md` — FID 2.87 [1.65, 4.82] on n=200
39
+ - `tab_rqrl_t1_bootstrap_ci.md` — TFG 0.4384 [0.351, 0.527] B=5000
40
+ - `reasoning_rl_multiseed_summary.{md,json}` — T1/T2/T3 multi-seed aggregator
41
+ - `cycle_70zz44_*` — production-route MoE end-to-end eval evidence
42
+ - `cycle_70zz33_lraxis_bootstrap_ci.md` — T3 RL lr-axis bootstrap
43
+
44
+ ## 4. Multi-seed reasoning RL (under `explcre/phase8_rl/_reasoning_rl_multiseed/`)
45
+
46
+ Each `exp_phase8_reasoning_grounded_rl_<task>_r128_alpha1_s<seed>_*`
47
+ directory has `best.pt`, `log.jsonl`, `manifest.json`, and the
48
+ matching `eval_reasoning_<task>_v7r128_postRL_alpha1_s<seed>_*/`
49
+ has `score.json` + `score.md`.
50
+
51
+ Tasks/seeds covered: T1 s=2,3 (cycle 70zz46); T3 s=2,3 (cycle 70zz48 parallel);
52
+ T2 s=2 (cycle 70zz47). T2 s=3 was in flight at snapshot time.
53
+
54
+ ## 5. Claude memory (under `explcre/phase8_rl/_claude_memory/`)
55
+
56
+ 8 files covering user role, feedback rules, project context. Place
57
+ under `~/.claude/projects/-workspace/memory/` on the new machine.
58
+
59
+ ## 6. Lab-cluster artifacts (lab side, under `explcre/phase7_multitask`)
60
+
61
+ The lab cluster (3090×6, A6000×8, H100×4) ran SLURM jobs for the
62
+ FULL_AUDIT 4-algo×3-seed grid + L7 edit-tight on 650M trunk.
63
+ Outputs already merged into `mllm-integrate` and figures committed
64
+ to `paper/figures/`.
65
+
66
+ ## 7. Public reference data (NOT in our HF; download fresh)
67
+
68
+ - PsychENCODE source: `https://psychencode.synapse.org/`
69
+ - NTv3 100M-post / 650M snapshots: `https://huggingface.co/InstaDeep/NTv3-...`
70
+ - Qwen3.5-0.8B base: `https://huggingface.co/Qwen/Qwen2.5-0.5B` (or local)
71
+ - JASPAR motif PWMs: `https://jaspar.genereg.net/`
72
+
73
+ ---
74
+
75
+ ## 🔒 What you must save LOCALLY (do NOT upload)
76
+
77
+ Save these to your local machine — they are credentials and not
78
+ publishable:
79
+
80
+ | File | Where | Why |
81
+ |---|---|---|
82
+ | `/workspace/dnathinker/.env` | rsync to `~/dnathinker.env.bak` | OPENROUTER_API_KEY_{1..N} for reasoning expansion |
83
+ | `~/.huggingface/token` (or `HF_TOKEN` env) | already on your machine | HF push permissions |
84
+ | `~/.ssh/id_*` | already on your machine | GitHub push |
85
+ | `~/.netrc` (if it exists) | already on your machine | git auth fallback |
86
+ | `~/.kaggle/` (if used) | already on your machine | Kaggle data |
87
+ | Any `*.env` under `/workspace/biomodel_reasoning_calling_study2/` | rsync | task-local secrets |
88
+
89
+ ```bash
90
+ # Suggested local backup commands
91
+ rsync -av root@<server3-host>:/workspace/dnathinker/.env ~/dnathinker.env.bak
92
+ # (HF token already in ~/.huggingface/ on your local machine)
93
+ ```
94
+
95
+ ---
96
+
97
+ ## How to resume work on a fresh machine
98
+
99
+ ```bash
100
+ # 1. Clone repo
101
+ git clone git@github.com:explcre/biomodel_reasoning_calling_study2.git
102
+ cd biomodel_reasoning_calling_study2/regureasoner_loop
103
+ git checkout mllm-integrate-server3
104
+
105
+ # 2. Install dependencies
106
+ pip install -r requirements.txt # if exists, else use existing env
107
+ # Required: torch, transformers, peft, huggingface_hub, datasets,
108
+ # numpy, pandas, matplotlib, pyyaml, etc.
109
+
110
+ # 3. Restore Claude memory (if using Claude Code)
111
+ mkdir -p ~/.claude/projects/-workspace/memory/
112
+ huggingface-cli download explcre/phase8_rl _claude_memory \
113
+ --local-dir ~/.claude/projects/-workspace/memory/ --include "_claude_memory/*"
114
+ mv ~/.claude/projects/-workspace/memory/_claude_memory/* \
115
+ ~/.claude/projects/-workspace/memory/
116
+
117
+ # 4. Restore .env (from your local backup)
118
+ cp ~/dnathinker.env.bak /workspace/dnathinker/.env
119
+
120
+ # 5. Pull critical model ckpts as needed
121
+ # T3 edit-tight production trunk:
122
+ mkdir -p /workspace/dnathinker/runs/
123
+ huggingface-cli download explcre/dnathinker_t3_trunks \
124
+ exp_t3_edit_tight_20260505 --local-dir /workspace/dnathinker/runs/exp_t3_edit_tight_20260505
125
+
126
+ # 6. Compile paper to verify
127
+ cd paper && pdflatex main.tex && bibtex main && pdflatex main.tex && pdflatex main.tex
128
+ ```
129
+
130
+ Done. Everything paper-grade is mirrored; nothing project-essential
131
+ is local-only.