ML4BM-Lab
/

DeepRBP

@@ -35,11 +35,11 @@ DeepRBP is composed of two conceptual stages:
    - Gene expression
 2. **Explainability stage**
-   Feature attribution methods (e.g. DeepLIFT) are applied on the trained predictor to compute:
    - Transcript × RBP (TxRBP) scores
    - Gene × RBP (GxRBP) scores
-This repository contains **only the pretrained predictor and its required preprocessing artifacts**.
 ---
@@ -52,9 +52,12 @@ This repository contains **only the pretrained predictor and its required prepro
 | `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
 | `scaler.joblib` | Fitted input scaler used during model training |
 | `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
 The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
 ---
 ## Intended use
@@ -62,13 +65,13 @@ The scaler and sigma are **part of the trained model state** and must be used to
 This pretrained model is intended for:
 - Computing transcript abundance predictions
-- Running explainability analyses (e.g. DeepLIFT-based attribution)
 - Identifying candidate RBP–transcript and RBP–gene regulatory relationships
 - Downstream biological interpretation and hypothesis generation
 Typical applications include:
-- Cancer transcriptomics (e.g. TCGA)
-- Perturbation studies (e.g. RBP knockdowns)
 - Comparative regulatory analyses across conditions
 ---
@@ -97,17 +100,18 @@ The main repository contains:
 - The model was trained on public datasets (TCGA).
 - The provided scaler and sigma ensure:
-  - Consistent input normalization
-  - Comparable predictions and explainability scores across users
-- Using a different scaler or recomputing normalization **will break comparability**.
 ---
 ## Limitations
 - The model was trained on bulk RNA-seq data and may not generalize to:
-  - Single-cell RNA-seq
-  - Extremely low-coverage datasets
 - Predictions represent **associations**, not direct causal regulation.
 - Experimental validation is required before biological conclusions.
@@ -125,6 +129,6 @@ You are free to use, modify and redistribute it, provided that the license and c
 If you use DeepRBP in your work, please cite:
-DeepRBP: A deep neural network for inferring splicing regulation
-bioRxiv (2024)
 https://doi.org/10.1101/2024.04.11.589004

    - Gene expression
 2. **Explainability stage**
+   Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
    - Transcript × RBP (TxRBP) scores
    - Gene × RBP (GxRBP) scores
+This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.
 ---
 | `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
 | `scaler.joblib` | Fitted input scaler used during model training |
 | `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
+| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |
 The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
+The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.
 ---
 ## Intended use
 This pretrained model is intended for:
 - Computing transcript abundance predictions
+- Running explainability analyses (e.g., DeepLIFT-based attribution)
 - Identifying candidate RBP–transcript and RBP–gene regulatory relationships
 - Downstream biological interpretation and hypothesis generation
 Typical applications include:
+- Cancer transcriptomics (e.g., TCGA)
+- Perturbation studies (e.g., RBP knockdowns)
 - Comparative regulatory analyses across conditions
 ---
 - The model was trained on public datasets (TCGA).
 - The provided scaler and sigma ensure:
+  - consistent input normalization,
+  - comparable predictions and attribution scores across users.
+- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
+  Using inputs that are not aligned to this specification will break compatibility and comparability.
 ---
 ## Limitations
 - The model was trained on bulk RNA-seq data and may not generalize to:
+  - single-cell RNA-seq
+  - extremely low-coverage datasets
 - Predictions represent **associations**, not direct causal regulation.
 - Experimental validation is required before biological conclusions.
 If you use DeepRBP in your work, please cite:
+DeepRBP: A deep neural network for inferring splicing regulation
+bioRxiv (2024)
 https://doi.org/10.1101/2024.04.11.589004