Update README.md
Browse files
README.md
CHANGED
|
@@ -35,11 +35,11 @@ DeepRBP is composed of two conceptual stages:
|
|
| 35 |
- Gene expression
|
| 36 |
|
| 37 |
2. **Explainability stage**
|
| 38 |
-
Feature attribution methods (e.g
|
| 39 |
- Transcript × RBP (TxRBP) scores
|
| 40 |
- Gene × RBP (GxRBP) scores
|
| 41 |
|
| 42 |
-
This repository contains **only the pretrained predictor and
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
@@ -52,9 +52,12 @@ This repository contains **only the pretrained predictor and its required prepro
|
|
| 52 |
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
|
| 53 |
| `scaler.joblib` | Fitted input scaler used during model training |
|
| 54 |
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
|
|
|
|
| 55 |
|
| 56 |
The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
|
| 57 |
|
|
|
|
|
|
|
| 58 |
---
|
| 59 |
|
| 60 |
## Intended use
|
|
@@ -62,13 +65,13 @@ The scaler and sigma are **part of the trained model state** and must be used to
|
|
| 62 |
This pretrained model is intended for:
|
| 63 |
|
| 64 |
- Computing transcript abundance predictions
|
| 65 |
-
- Running explainability analyses (e.g
|
| 66 |
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
|
| 67 |
- Downstream biological interpretation and hypothesis generation
|
| 68 |
|
| 69 |
Typical applications include:
|
| 70 |
-
- Cancer transcriptomics (e.g
|
| 71 |
-
- Perturbation studies (e.g
|
| 72 |
- Comparative regulatory analyses across conditions
|
| 73 |
|
| 74 |
---
|
|
@@ -97,17 +100,18 @@ The main repository contains:
|
|
| 97 |
|
| 98 |
- The model was trained on public datasets (TCGA).
|
| 99 |
- The provided scaler and sigma ensure:
|
| 100 |
-
-
|
| 101 |
-
-
|
| 102 |
-
-
|
|
|
|
| 103 |
|
| 104 |
---
|
| 105 |
|
| 106 |
## Limitations
|
| 107 |
|
| 108 |
- The model was trained on bulk RNA-seq data and may not generalize to:
|
| 109 |
-
-
|
| 110 |
-
-
|
| 111 |
- Predictions represent **associations**, not direct causal regulation.
|
| 112 |
- Experimental validation is required before biological conclusions.
|
| 113 |
|
|
@@ -125,6 +129,6 @@ You are free to use, modify and redistribute it, provided that the license and c
|
|
| 125 |
|
| 126 |
If you use DeepRBP in your work, please cite:
|
| 127 |
|
| 128 |
-
DeepRBP: A deep neural network for inferring splicing regulation
|
| 129 |
-
bioRxiv (2024)
|
| 130 |
https://doi.org/10.1101/2024.04.11.589004
|
|
|
|
| 35 |
- Gene expression
|
| 36 |
|
| 37 |
2. **Explainability stage**
|
| 38 |
+
Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
|
| 39 |
- Transcript × RBP (TxRBP) scores
|
| 40 |
- Gene × RBP (GxRBP) scores
|
| 41 |
|
| 42 |
+
This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
|
|
| 52 |
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
|
| 53 |
| `scaler.joblib` | Fitted input scaler used during model training |
|
| 54 |
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
|
| 55 |
+
| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |
|
| 56 |
|
| 57 |
The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
|
| 58 |
|
| 59 |
+
The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.
|
| 60 |
+
|
| 61 |
---
|
| 62 |
|
| 63 |
## Intended use
|
|
|
|
| 65 |
This pretrained model is intended for:
|
| 66 |
|
| 67 |
- Computing transcript abundance predictions
|
| 68 |
+
- Running explainability analyses (e.g., DeepLIFT-based attribution)
|
| 69 |
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
|
| 70 |
- Downstream biological interpretation and hypothesis generation
|
| 71 |
|
| 72 |
Typical applications include:
|
| 73 |
+
- Cancer transcriptomics (e.g., TCGA)
|
| 74 |
+
- Perturbation studies (e.g., RBP knockdowns)
|
| 75 |
- Comparative regulatory analyses across conditions
|
| 76 |
|
| 77 |
---
|
|
|
|
| 100 |
|
| 101 |
- The model was trained on public datasets (TCGA).
|
| 102 |
- The provided scaler and sigma ensure:
|
| 103 |
+
- consistent input normalization,
|
| 104 |
+
- comparable predictions and attribution scores across users.
|
| 105 |
+
- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
|
| 106 |
+
Using inputs that are not aligned to this specification will break compatibility and comparability.
|
| 107 |
|
| 108 |
---
|
| 109 |
|
| 110 |
## Limitations
|
| 111 |
|
| 112 |
- The model was trained on bulk RNA-seq data and may not generalize to:
|
| 113 |
+
- single-cell RNA-seq
|
| 114 |
+
- extremely low-coverage datasets
|
| 115 |
- Predictions represent **associations**, not direct causal regulation.
|
| 116 |
- Experimental validation is required before biological conclusions.
|
| 117 |
|
|
|
|
| 129 |
|
| 130 |
If you use DeepRBP in your work, please cite:
|
| 131 |
|
| 132 |
+
DeepRBP: A deep neural network for inferring splicing regulation
|
| 133 |
+
bioRxiv (2024)
|
| 134 |
https://doi.org/10.1101/2024.04.11.589004
|