---
library_name: pytorch
framework: pytorch
tags:
  - pytorch
  - pytorch-lightning
  - bioinformatics
  - rna-binding-proteins
  - explainability
  - alternative-splicing
  - deep-learning
license: mit
---

# DeepRBP Predictor (pretrained)

This repository provides a **pretrained DeepRBP predictor model**, a deep learning framework designed to infer **RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships** from expression data.

DeepRBP was introduced in the following preprint:

> **DeepRBP: A deep neural network for inferring splicing regulation**  
> https://doi.org/10.1101/2024.04.11.589004

The model is intended to be used **directly for inference and explainability**, without retraining.

---

## Model overview

DeepRBP is composed of two conceptual stages:

1. **Prediction stage**  
   A neural network predicts transcript abundances from:
   - RBP expression
   - Gene expression

2. **Explainability stage**  
   Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
   - Transcript × RBP (TxRBP) scores
   - Gene × RBP (GxRBP) scores

This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.

---

## Files in this repository

⚠️ **All files are required for correct inference and explainability.**

| File | Description |
|-----|-------------|
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
| `scaler.joblib` | Fitted input scaler used during model training |
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |

The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.

The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.

---

## Intended use

This pretrained model is intended for:

- Computing transcript abundance predictions
- Running explainability analyses (e.g., DeepLIFT-based attribution)
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
- Downstream biological interpretation and hypothesis generation

Typical applications include:
- Cancer transcriptomics (e.g., TCGA)
- Perturbation studies (e.g., RBP knockdowns)
- Comparative regulatory analyses across conditions

---

## Usage

This repository **does not provide a standalone inference script**.

Please refer to the **main DeepRBP code repository** for:
- Data preprocessing
- Model loading
- Running prediction and explainability pipelines

👉 **Main repository:**  
https://github.com/ML4BM-Lab/DeepRBP

The main repository contains:
- End-to-end examples
- Command-line interfaces
- Explainability workflows
- Validation pipelines

---

## Reproducibility notes

- The model was trained on public datasets (TCGA).
- The provided scaler and sigma ensure:
  - consistent input normalization,
  - comparable predictions and attribution scores across users.
- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
  Using inputs that are not aligned to this specification will break compatibility and comparability.

---

## Limitations

- The model was trained on bulk RNA-seq data and may not generalize to:
  - single-cell RNA-seq
  - extremely low-coverage datasets
- Predictions represent **associations**, not direct causal regulation.
- Experimental validation is required before biological conclusions.

---

## License

This model is released under the **MIT License**.

You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved.

---

## Citation

If you use DeepRBP in your work, please cite:

DeepRBP: A deep neural network for inferring splicing regulation  
bioRxiv (2024)  
https://doi.org/10.1101/2024.04.11.589004