DeepRBP / README.md

Update README.md

2e6b614 verified 2 months ago

4.07 kB

	---
	library_name: pytorch
	framework: pytorch
	tags:
	- pytorch
	- pytorch-lightning
	- bioinformatics
	- rna-binding-proteins
	- explainability
	- alternative-splicing
	- deep-learning
	license: mit
	---

	# DeepRBP Predictor (pretrained)

	This repository provides a pretrained DeepRBP predictor model, a deep learning framework designed to infer RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships from expression data.

	DeepRBP was introduced in the following preprint:

	> DeepRBP: A deep neural network for inferring splicing regulation
	> https://doi.org/10.1101/2024.04.11.589004

	The model is intended to be used directly for inference and explainability, without retraining.

	---

	## Model overview

	DeepRBP is composed of two conceptual stages:

	1. Prediction stage
	A neural network predicts transcript abundances from:
	- RBP expression
	- Gene expression

	2. Explainability stage
	Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
	- Transcript × RBP (TxRBP) scores
	- Gene × RBP (GxRBP) scores

	This repository contains only the pretrained predictor and the required preprocessing artifacts needed to use it.

	---

	## Files in this repository

	⚠️ All files are required for correct inference and explainability.

	\| File \| Description \|
	\|-----\|-------------\|
	\| `model.ckpt` \| PyTorch Lightning checkpoint of the pretrained DeepRBP predictor \|
	\| `scaler.joblib` \| Fitted input scaler used during model training \|
	\| `sigma.npy` \| Scaling parameter required to reconstruct transcript abundance values \|
	\| `DeepRBP_feature_spec.xlsx` \| Feature manifest defining the RBPs/genes/transcripts and their exact order \|

	The scaler and sigma are part of the trained model state and must be used together with the checkpoint.

	The feature specification file is part of the model compatibility contract: input matrices must be aligned to the same feature set and order used during training.

	---

	## Intended use

	This pretrained model is intended for:

	- Computing transcript abundance predictions
	- Running explainability analyses (e.g., DeepLIFT-based attribution)
	- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
	- Downstream biological interpretation and hypothesis generation

	Typical applications include:
	- Cancer transcriptomics (e.g., TCGA)
	- Perturbation studies (e.g., RBP knockdowns)
	- Comparative regulatory analyses across conditions

	---

	## Usage

	This repository does not provide a standalone inference script.

	Please refer to the main DeepRBP code repository for:
	- Data preprocessing
	- Model loading
	- Running prediction and explainability pipelines

	👉 Main repository:
	https://github.com/ML4BM-Lab/DeepRBP

	The main repository contains:
	- End-to-end examples
	- Command-line interfaces
	- Explainability workflows
	- Validation pipelines

	---

	## Reproducibility notes

	- The model was trained on public datasets (TCGA).
	- The provided scaler and sigma ensure:
	- consistent input normalization,
	- comparable predictions and attribution scores across users.
	- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
	Using inputs that are not aligned to this specification will break compatibility and comparability.

	---

	## Limitations

	- The model was trained on bulk RNA-seq data and may not generalize to:
	- single-cell RNA-seq
	- extremely low-coverage datasets
	- Predictions represent associations, not direct causal regulation.
	- Experimental validation is required before biological conclusions.

	---

	## License

	This model is released under the MIT License.

	You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved.

	---

	## Citation

	If you use DeepRBP in your work, please cite:

	DeepRBP: A deep neural network for inferring splicing regulation
	bioRxiv (2024)
	https://doi.org/10.1101/2024.04.11.589004