jsanchoz commited on
Commit
2e6b614
·
verified ·
1 Parent(s): a6489a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -35,11 +35,11 @@ DeepRBP is composed of two conceptual stages:
35
  - Gene expression
36
 
37
  2. **Explainability stage**
38
- Feature attribution methods (e.g. DeepLIFT) are applied on the trained predictor to compute:
39
  - Transcript × RBP (TxRBP) scores
40
  - Gene × RBP (GxRBP) scores
41
 
42
- This repository contains **only the pretrained predictor and its required preprocessing artifacts**.
43
 
44
  ---
45
 
@@ -52,9 +52,12 @@ This repository contains **only the pretrained predictor and its required prepro
52
  | `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
53
  | `scaler.joblib` | Fitted input scaler used during model training |
54
  | `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
 
55
 
56
  The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
57
 
 
 
58
  ---
59
 
60
  ## Intended use
@@ -62,13 +65,13 @@ The scaler and sigma are **part of the trained model state** and must be used to
62
  This pretrained model is intended for:
63
 
64
  - Computing transcript abundance predictions
65
- - Running explainability analyses (e.g. DeepLIFT-based attribution)
66
  - Identifying candidate RBP–transcript and RBP–gene regulatory relationships
67
  - Downstream biological interpretation and hypothesis generation
68
 
69
  Typical applications include:
70
- - Cancer transcriptomics (e.g. TCGA)
71
- - Perturbation studies (e.g. RBP knockdowns)
72
  - Comparative regulatory analyses across conditions
73
 
74
  ---
@@ -97,17 +100,18 @@ The main repository contains:
97
 
98
  - The model was trained on public datasets (TCGA).
99
  - The provided scaler and sigma ensure:
100
- - Consistent input normalization
101
- - Comparable predictions and explainability scores across users
102
- - Using a different scaler or recomputing normalization **will break comparability**.
 
103
 
104
  ---
105
 
106
  ## Limitations
107
 
108
  - The model was trained on bulk RNA-seq data and may not generalize to:
109
- - Single-cell RNA-seq
110
- - Extremely low-coverage datasets
111
  - Predictions represent **associations**, not direct causal regulation.
112
  - Experimental validation is required before biological conclusions.
113
 
@@ -125,6 +129,6 @@ You are free to use, modify and redistribute it, provided that the license and c
125
 
126
  If you use DeepRBP in your work, please cite:
127
 
128
- DeepRBP: A deep neural network for inferring splicing regulation
129
- bioRxiv (2024)
130
  https://doi.org/10.1101/2024.04.11.589004
 
35
  - Gene expression
36
 
37
  2. **Explainability stage**
38
+ Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
39
  - Transcript × RBP (TxRBP) scores
40
  - Gene × RBP (GxRBP) scores
41
 
42
+ This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.
43
 
44
  ---
45
 
 
52
  | `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
53
  | `scaler.joblib` | Fitted input scaler used during model training |
54
  | `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
55
+ | `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |
56
 
57
  The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
58
 
59
+ The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.
60
+
61
  ---
62
 
63
  ## Intended use
 
65
  This pretrained model is intended for:
66
 
67
  - Computing transcript abundance predictions
68
+ - Running explainability analyses (e.g., DeepLIFT-based attribution)
69
  - Identifying candidate RBP–transcript and RBP–gene regulatory relationships
70
  - Downstream biological interpretation and hypothesis generation
71
 
72
  Typical applications include:
73
+ - Cancer transcriptomics (e.g., TCGA)
74
+ - Perturbation studies (e.g., RBP knockdowns)
75
  - Comparative regulatory analyses across conditions
76
 
77
  ---
 
100
 
101
  - The model was trained on public datasets (TCGA).
102
  - The provided scaler and sigma ensure:
103
+ - consistent input normalization,
104
+ - comparable predictions and attribution scores across users.
105
+ - The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
106
+ Using inputs that are not aligned to this specification will break compatibility and comparability.
107
 
108
  ---
109
 
110
  ## Limitations
111
 
112
  - The model was trained on bulk RNA-seq data and may not generalize to:
113
+ - single-cell RNA-seq
114
+ - extremely low-coverage datasets
115
  - Predictions represent **associations**, not direct causal regulation.
116
  - Experimental validation is required before biological conclusions.
117
 
 
129
 
130
  If you use DeepRBP in your work, please cite:
131
 
132
+ DeepRBP: A deep neural network for inferring splicing regulation
133
+ bioRxiv (2024)
134
  https://doi.org/10.1101/2024.04.11.589004