File size: 1,606 Bytes
9d820ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
name: HF before delete
description: Always upload valuable data (ckpts, logs, predictions) to Hugging Face BEFORE deleting from local disk. Don't lose data to disk pressure.
type: feedback
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
---
**Rule**: Before running `rm` / `git rm` / quarantine-delete on any artifact under `/workspace/dnathinker/` or `/shm/dnathinker_quarantine/`, **first upload to HF if it isn't already mirrored**.

**Why**: User direction (2026-05-05) after I deleted Phase-8 RL through-MDLM `log.jsonl` files from `/shm/dnathinker_quarantine/` during cycle 50 cleanup. Those logs were the ONLY source for the `F6_rl_training_curves.pdf` paper figure. They were not on HF; deletion was irreversible. Result: F6 panel can't be regenerated and we lost reproducibility.

**How to apply**:
1. Before any `rm` on `/workspace/dnathinker/runs/` or `/shm/dnathinker_quarantine/`, check `MANIFEST.tsv` for an `HF: <repo>/<path>` annotation.
2. If absent, run `scripts/innovations/hf_auto_uploader.py --once` (or specifically `hf_upload_finished_models.py` for ckpts) and verify upload succeeded BEFORE deleting.
3. For run-dir cleanup: keep `log.jsonl`, `manifest.json`, `train.log`, `eval_*.log`, and any `*_score*.json/md` — they're tiny and used to regenerate paper figures. ckpts can be deleted if HF-mirrored.
4. Annotate the deletion in `/shm/dnathinker_quarantine/MANIFEST.tsv` with the HF reference.

**Exceptions**: tmp files (`/tmp/*`, `_bench_logs` rotations explicitly marked stale, intermediate dataloader caches that are fast to regenerate) don't need HF round-trip.