Upload _claude_memory/feedback_t1_metrics_complete.md with huggingface_hub
Browse files
_claude_memory/feedback_t1_metrics_complete.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: T1 evaluation metrics must include ALL categories
|
| 3 |
+
description: Whenever showing T1 (enhancer_generation) evaluation results, always report oracle + LEONINE motif + basic metrics together — never just one slice
|
| 4 |
+
type: feedback
|
| 5 |
+
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
|
| 6 |
+
---
|
| 7 |
+
When reporting T1 evaluation metrics, ALWAYS include all of these categories together in the same table/summary:
|
| 8 |
+
|
| 9 |
+
1. **Oracle metrics** (cell-type specificity via the v3 separate-7 classifier):
|
| 10 |
+
- `gen_top1` (joint argmax over 7 sigmoid outputs)
|
| 11 |
+
- `gen_mean_auroc` (average per-cell AUROC)
|
| 12 |
+
- `gen_target_ce` (cross-entropy of softmax(logits) against target cell)
|
| 13 |
+
- `gold_top1` (sanity baseline — should be ~0.97 with v3 separate-7)
|
| 14 |
+
- per-cell recall + AUROC breakdown
|
| 15 |
+
|
| 16 |
+
2. **LEONINE-style motif metrics** (cell-type-specific motif distribution):
|
| 17 |
+
- JS divergence heatmap (matched cell vs unmatched cells)
|
| 18 |
+
- diagonal-vs-offdiag specificity ratio
|
| 19 |
+
- Frobenius norm of heatmap difference vs gold
|
| 20 |
+
- per-cell motif distribution similarity
|
| 21 |
+
|
| 22 |
+
3. **CtrlDNA-style motif metrics**:
|
| 23 |
+
- per-cell motif correlation R² (gen vs gold motif counts)
|
| 24 |
+
- aggregate correlation across cells
|
| 25 |
+
|
| 26 |
+
4. **TF-program-filtered specificity** (lab's TF panel):
|
| 27 |
+
- `tf_program_specificity` per cell
|
| 28 |
+
|
| 29 |
+
5. **Basic validity / fluency**:
|
| 30 |
+
- `parse_rate` (ACGT-only, length-valid sequences)
|
| 31 |
+
- `mean_length_ratio` (gen length / gold length)
|
| 32 |
+
- `mean_gc_abs_err` (|GC_gen − GC_gold|)
|
| 33 |
+
- `kmer_shannon_entropy`, `kmer_unique_frac` (diversity)
|
| 34 |
+
- homopolymer + tandem repeat fractions
|
| 35 |
+
|
| 36 |
+
**Why:** the user is paper-grade and explicitly told me "next time anytime you show evaluation metrics for t1 you should include them all" with three exclamation marks. A partial metrics table (e.g., only oracle, or only motif) is a shortcut that hides failure modes the user needs to see.
|
| 37 |
+
|
| 38 |
+
**How to apply:** after running `score_with_classifier_oracle.py` on a prediction file, also run `run_t1_eval.py --scanner moods` (or `fimo`) to get the motif + basic metrics. Combine into one table per variant. If MOODS/FIMO is still running, say so explicitly and note which categories are pending — don't quietly omit them.
|