|
|
--- |
|
|
library_name: pytorch |
|
|
framework: pytorch |
|
|
tags: |
|
|
- pytorch |
|
|
- pytorch-lightning |
|
|
- bioinformatics |
|
|
- rna-binding-proteins |
|
|
- explainability |
|
|
- alternative-splicing |
|
|
- deep-learning |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# DeepRBP Predictor (pretrained) |
|
|
|
|
|
This repository provides a **pretrained DeepRBP predictor model**, a deep learning framework designed to infer **RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships** from expression data. |
|
|
|
|
|
DeepRBP was introduced in the following preprint: |
|
|
|
|
|
> **DeepRBP: A deep neural network for inferring splicing regulation** |
|
|
> https://doi.org/10.1101/2024.04.11.589004 |
|
|
|
|
|
The model is intended to be used **directly for inference and explainability**, without retraining. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model overview |
|
|
|
|
|
DeepRBP is composed of two conceptual stages: |
|
|
|
|
|
1. **Prediction stage** |
|
|
A neural network predicts transcript abundances from: |
|
|
- RBP expression |
|
|
- Gene expression |
|
|
|
|
|
2. **Explainability stage** |
|
|
Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute: |
|
|
- Transcript × RBP (TxRBP) scores |
|
|
- Gene × RBP (GxRBP) scores |
|
|
|
|
|
This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it. |
|
|
|
|
|
--- |
|
|
|
|
|
## Files in this repository |
|
|
|
|
|
⚠️ **All files are required for correct inference and explainability.** |
|
|
|
|
|
| File | Description | |
|
|
|-----|-------------| |
|
|
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor | |
|
|
| `scaler.joblib` | Fitted input scaler used during model training | |
|
|
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values | |
|
|
| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order | |
|
|
|
|
|
The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint. |
|
|
|
|
|
The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended use |
|
|
|
|
|
This pretrained model is intended for: |
|
|
|
|
|
- Computing transcript abundance predictions |
|
|
- Running explainability analyses (e.g., DeepLIFT-based attribution) |
|
|
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships |
|
|
- Downstream biological interpretation and hypothesis generation |
|
|
|
|
|
Typical applications include: |
|
|
- Cancer transcriptomics (e.g., TCGA) |
|
|
- Perturbation studies (e.g., RBP knockdowns) |
|
|
- Comparative regulatory analyses across conditions |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
This repository **does not provide a standalone inference script**. |
|
|
|
|
|
Please refer to the **main DeepRBP code repository** for: |
|
|
- Data preprocessing |
|
|
- Model loading |
|
|
- Running prediction and explainability pipelines |
|
|
|
|
|
👉 **Main repository:** |
|
|
https://github.com/ML4BM-Lab/DeepRBP |
|
|
|
|
|
The main repository contains: |
|
|
- End-to-end examples |
|
|
- Command-line interfaces |
|
|
- Explainability workflows |
|
|
- Validation pipelines |
|
|
|
|
|
--- |
|
|
|
|
|
## Reproducibility notes |
|
|
|
|
|
- The model was trained on public datasets (TCGA). |
|
|
- The provided scaler and sigma ensure: |
|
|
- consistent input normalization, |
|
|
- comparable predictions and attribution scores across users. |
|
|
- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training. |
|
|
Using inputs that are not aligned to this specification will break compatibility and comparability. |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The model was trained on bulk RNA-seq data and may not generalize to: |
|
|
- single-cell RNA-seq |
|
|
- extremely low-coverage datasets |
|
|
- Predictions represent **associations**, not direct causal regulation. |
|
|
- Experimental validation is required before biological conclusions. |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **MIT License**. |
|
|
|
|
|
You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use DeepRBP in your work, please cite: |
|
|
|
|
|
DeepRBP: A deep neural network for inferring splicing regulation |
|
|
bioRxiv (2024) |
|
|
https://doi.org/10.1101/2024.04.11.589004 |
|
|
|