DeepRBP / README.md
jsanchoz's picture
Update README.md
2e6b614 verified
---
library_name: pytorch
framework: pytorch
tags:
- pytorch
- pytorch-lightning
- bioinformatics
- rna-binding-proteins
- explainability
- alternative-splicing
- deep-learning
license: mit
---
# DeepRBP Predictor (pretrained)
This repository provides a **pretrained DeepRBP predictor model**, a deep learning framework designed to infer **RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships** from expression data.
DeepRBP was introduced in the following preprint:
> **DeepRBP: A deep neural network for inferring splicing regulation**
> https://doi.org/10.1101/2024.04.11.589004
The model is intended to be used **directly for inference and explainability**, without retraining.
---
## Model overview
DeepRBP is composed of two conceptual stages:
1. **Prediction stage**
A neural network predicts transcript abundances from:
- RBP expression
- Gene expression
2. **Explainability stage**
Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
- Transcript × RBP (TxRBP) scores
- Gene × RBP (GxRBP) scores
This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.
---
## Files in this repository
⚠️ **All files are required for correct inference and explainability.**
| File | Description |
|-----|-------------|
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
| `scaler.joblib` | Fitted input scaler used during model training |
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |
The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.
The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.
---
## Intended use
This pretrained model is intended for:
- Computing transcript abundance predictions
- Running explainability analyses (e.g., DeepLIFT-based attribution)
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
- Downstream biological interpretation and hypothesis generation
Typical applications include:
- Cancer transcriptomics (e.g., TCGA)
- Perturbation studies (e.g., RBP knockdowns)
- Comparative regulatory analyses across conditions
---
## Usage
This repository **does not provide a standalone inference script**.
Please refer to the **main DeepRBP code repository** for:
- Data preprocessing
- Model loading
- Running prediction and explainability pipelines
👉 **Main repository:**
https://github.com/ML4BM-Lab/DeepRBP
The main repository contains:
- End-to-end examples
- Command-line interfaces
- Explainability workflows
- Validation pipelines
---
## Reproducibility notes
- The model was trained on public datasets (TCGA).
- The provided scaler and sigma ensure:
- consistent input normalization,
- comparable predictions and attribution scores across users.
- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
Using inputs that are not aligned to this specification will break compatibility and comparability.
---
## Limitations
- The model was trained on bulk RNA-seq data and may not generalize to:
- single-cell RNA-seq
- extremely low-coverage datasets
- Predictions represent **associations**, not direct causal regulation.
- Experimental validation is required before biological conclusions.
---
## License
This model is released under the **MIT License**.
You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved.
---
## Citation
If you use DeepRBP in your work, please cite:
DeepRBP: A deep neural network for inferring splicing regulation
bioRxiv (2024)
https://doi.org/10.1101/2024.04.11.589004