--- library_name: pytorch framework: pytorch tags: - pytorch - pytorch-lightning - bioinformatics - rna-binding-proteins - explainability - alternative-splicing - deep-learning license: mit --- # DeepRBP Predictor (pretrained) This repository provides a **pretrained DeepRBP predictor model**, a deep learning framework designed to infer **RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships** from expression data. DeepRBP was introduced in the following preprint: > **DeepRBP: A deep neural network for inferring splicing regulation** > https://doi.org/10.1101/2024.04.11.589004 The model is intended to be used **directly for inference and explainability**, without retraining. --- ## Model overview DeepRBP is composed of two conceptual stages: 1. **Prediction stage** A neural network predicts transcript abundances from: - RBP expression - Gene expression 2. **Explainability stage** Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute: - Transcript × RBP (TxRBP) scores - Gene × RBP (GxRBP) scores This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it. --- ## Files in this repository ⚠️ **All files are required for correct inference and explainability.** | File | Description | |-----|-------------| | `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor | | `scaler.joblib` | Fitted input scaler used during model training | | `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values | | `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order | The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint. The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training. --- ## Intended use This pretrained model is intended for: - Computing transcript abundance predictions - Running explainability analyses (e.g., DeepLIFT-based attribution) - Identifying candidate RBP–transcript and RBP–gene regulatory relationships - Downstream biological interpretation and hypothesis generation Typical applications include: - Cancer transcriptomics (e.g., TCGA) - Perturbation studies (e.g., RBP knockdowns) - Comparative regulatory analyses across conditions --- ## Usage This repository **does not provide a standalone inference script**. Please refer to the **main DeepRBP code repository** for: - Data preprocessing - Model loading - Running prediction and explainability pipelines 👉 **Main repository:** https://github.com/ML4BM-Lab/DeepRBP The main repository contains: - End-to-end examples - Command-line interfaces - Explainability workflows - Validation pipelines --- ## Reproducibility notes - The model was trained on public datasets (TCGA). - The provided scaler and sigma ensure: - consistent input normalization, - comparable predictions and attribution scores across users. - The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training. Using inputs that are not aligned to this specification will break compatibility and comparability. --- ## Limitations - The model was trained on bulk RNA-seq data and may not generalize to: - single-cell RNA-seq - extremely low-coverage datasets - Predictions represent **associations**, not direct causal regulation. - Experimental validation is required before biological conclusions. --- ## License This model is released under the **MIT License**. You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved. --- ## Citation If you use DeepRBP in your work, please cite: DeepRBP: A deep neural network for inferring splicing regulation bioRxiv (2024) https://doi.org/10.1101/2024.04.11.589004