Upload runs/exp_oracle_v3_binary7_separate_fast_h100/README.md with huggingface_hub

Browse files

Files changed (1) hide show

runs/exp_oracle_v3_binary7_separate_fast_h100/README.md +44 -0

runs/exp_oracle_v3_binary7_separate_fast_h100/README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# Oracle v3: LEONINE-strict 7-binary-classifier (joint top-1 = 0.973)
+LEONINE-faithful enhancer cell-type classifier. 7 separate DeepSTARR-XL
+networks (one per cell type), each trained as a binary classifier with
+1:1 pos:neg sampling. Joint inference: argmax over the 7 sigmoid outputs.
+**Held-out test set (3500 rows balanced 500/cell):**
+- Joint top-1: **0.973** (chance = 0.143)
+- Mean AUROC: **0.992**
+- Per-cell recall: Ex 0.97, In 0.98, OPC 0.98, Ast 0.98, Oli 0.95, Mic 0.97, End 0.98
+- Per-cell AUROC: all ≥ 0.986
+**Files:**
+- `oracle.pt` — bundled checkpoint with state["per_cell"] = {cell: state_dict}
+- `{Ex,In,OPC,Ast,Oli,Mic,End}/oracle.pt` — individual cell checkpoints
+- `bundle_separate_oracle.py` — loader (SeparateBinaryOracle wrapper)
+- `metrics.json` — per-cell training + joint eval metrics
+**Loading:**
+```python
+import torch
+from bundle_separate_oracle import SeparateBinaryOracle, build_one_cell_model, CELL_TYPES
+ckpt = torch.load("oracle.pt", map_location="cpu", weights_only=False)
+nets = {}
+for c in CELL_TYPES:
+    m = build_one_cell_model(c, input_length=600)
+    m.load_state_dict(ckpt["per_cell"][c], strict=True)
+    nets[c] = m
+oracle = SeparateBinaryOracle(nets, input_length=600).cuda().eval()
+# Two forward modes:
+# (1) Differentiable: oracle(soft_dna_tensor)  → (B, 7) logits
+# (2) Standard:       oracle(["ACGT...", ...]) → (B, 7) logits
+# .embed(seqs) returns (B, fc_dim=1024) penultimate features for FID.
+```
+**Architecture per cell:** DeepSTARR-XL backbone (4 conv blocks 256/256/128/120,
+fc=1024, dropout=0.3) + 1-output binary head. Trained on
+`oracle_train.7cell.fdr_both` with WeightedRandomSampler 1:1 pos:neg.
+**Training data:** the same `oracle_train.7cell.fdr_both.jsonl` used for the
+v2 regression oracle (kept for diff). Difference is the loss formulation
+and training schedule, not the data.