explcre commited on
Commit
5bc6d83
·
verified ·
1 Parent(s): 05264d1

Upload _claude_memory/feedback_t1_metrics_complete.md with huggingface_hub

Browse files
_claude_memory/feedback_t1_metrics_complete.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: T1 evaluation metrics must include ALL categories
3
+ description: Whenever showing T1 (enhancer_generation) evaluation results, always report oracle + LEONINE motif + basic metrics together — never just one slice
4
+ type: feedback
5
+ originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
6
+ ---
7
+ When reporting T1 evaluation metrics, ALWAYS include all of these categories together in the same table/summary:
8
+
9
+ 1. **Oracle metrics** (cell-type specificity via the v3 separate-7 classifier):
10
+ - `gen_top1` (joint argmax over 7 sigmoid outputs)
11
+ - `gen_mean_auroc` (average per-cell AUROC)
12
+ - `gen_target_ce` (cross-entropy of softmax(logits) against target cell)
13
+ - `gold_top1` (sanity baseline — should be ~0.97 with v3 separate-7)
14
+ - per-cell recall + AUROC breakdown
15
+
16
+ 2. **LEONINE-style motif metrics** (cell-type-specific motif distribution):
17
+ - JS divergence heatmap (matched cell vs unmatched cells)
18
+ - diagonal-vs-offdiag specificity ratio
19
+ - Frobenius norm of heatmap difference vs gold
20
+ - per-cell motif distribution similarity
21
+
22
+ 3. **CtrlDNA-style motif metrics**:
23
+ - per-cell motif correlation R² (gen vs gold motif counts)
24
+ - aggregate correlation across cells
25
+
26
+ 4. **TF-program-filtered specificity** (lab's TF panel):
27
+ - `tf_program_specificity` per cell
28
+
29
+ 5. **Basic validity / fluency**:
30
+ - `parse_rate` (ACGT-only, length-valid sequences)
31
+ - `mean_length_ratio` (gen length / gold length)
32
+ - `mean_gc_abs_err` (|GC_gen − GC_gold|)
33
+ - `kmer_shannon_entropy`, `kmer_unique_frac` (diversity)
34
+ - homopolymer + tandem repeat fractions
35
+
36
+ **Why:** the user is paper-grade and explicitly told me "next time anytime you show evaluation metrics for t1 you should include them all" with three exclamation marks. A partial metrics table (e.g., only oracle, or only motif) is a shortcut that hides failure modes the user needs to see.
37
+
38
+ **How to apply:** after running `score_with_classifier_oracle.py` on a prediction file, also run `run_t1_eval.py --scanner moods` (or `fimo`) to get the motif + basic metrics. Combine into one table per variant. If MOODS/FIMO is still running, say so explicitly and note which categories are pending — don't quietly omit them.