File size: 4,069 Bytes
e19a0cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e6b614
e19a0cb
 
 
2e6b614
e19a0cb
 
 
 
 
 
 
 
 
 
 
 
2e6b614
e19a0cb
 
 
2e6b614
 
e19a0cb
 
 
 
 
 
 
2e6b614
e19a0cb
 
 
 
2e6b614
 
e19a0cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c06ccea
e19a0cb
2e6b614
 
 
 
e19a0cb
 
 
 
 
 
2e6b614
 
e19a0cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e6b614
 
e19a0cb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
library_name: pytorch
framework: pytorch
tags:
  - pytorch
  - pytorch-lightning
  - bioinformatics
  - rna-binding-proteins
  - explainability
  - alternative-splicing
  - deep-learning
license: mit
---

# DeepRBP Predictor (pretrained)

This repository provides a **pretrained DeepRBP predictor model**, a deep learning framework designed to infer **RNA-binding protein (RBP)–transcript and RBP–gene regulatory relationships** from expression data.

DeepRBP was introduced in the following preprint:

> **DeepRBP: A deep neural network for inferring splicing regulation**  
> https://doi.org/10.1101/2024.04.11.589004

The model is intended to be used **directly for inference and explainability**, without retraining.

---

## Model overview

DeepRBP is composed of two conceptual stages:

1. **Prediction stage**  
   A neural network predicts transcript abundances from:
   - RBP expression
   - Gene expression

2. **Explainability stage**  
   Feature attribution methods (e.g., DeepLIFT) are applied on the trained predictor to compute:
   - Transcript × RBP (TxRBP) scores
   - Gene × RBP (GxRBP) scores

This repository contains **only the pretrained predictor and the required preprocessing artifacts** needed to use it.

---

## Files in this repository

⚠️ **All files are required for correct inference and explainability.**

| File | Description |
|-----|-------------|
| `model.ckpt` | PyTorch Lightning checkpoint of the pretrained DeepRBP predictor |
| `scaler.joblib` | Fitted input scaler used during model training |
| `sigma.npy` | Scaling parameter required to reconstruct transcript abundance values |
| `DeepRBP_feature_spec.xlsx` | Feature manifest defining the RBPs/genes/transcripts and their exact order |

The scaler and sigma are **part of the trained model state** and must be used together with the checkpoint.

The feature specification file is part of the **model compatibility contract**: input matrices must be aligned to the same feature set **and order** used during training.

---

## Intended use

This pretrained model is intended for:

- Computing transcript abundance predictions
- Running explainability analyses (e.g., DeepLIFT-based attribution)
- Identifying candidate RBP–transcript and RBP–gene regulatory relationships
- Downstream biological interpretation and hypothesis generation

Typical applications include:
- Cancer transcriptomics (e.g., TCGA)
- Perturbation studies (e.g., RBP knockdowns)
- Comparative regulatory analyses across conditions

---

## Usage

This repository **does not provide a standalone inference script**.

Please refer to the **main DeepRBP code repository** for:
- Data preprocessing
- Model loading
- Running prediction and explainability pipelines

👉 **Main repository:**  
https://github.com/ML4BM-Lab/DeepRBP

The main repository contains:
- End-to-end examples
- Command-line interfaces
- Explainability workflows
- Validation pipelines

---

## Reproducibility notes

- The model was trained on public datasets (TCGA).
- The provided scaler and sigma ensure:
  - consistent input normalization,
  - comparable predictions and attribution scores across users.
- The provided feature specification (`DeepRBP_feature_spec.xlsx`) defines the exact feature set and ordering used during training.
  Using inputs that are not aligned to this specification will break compatibility and comparability.

---

## Limitations

- The model was trained on bulk RNA-seq data and may not generalize to:
  - single-cell RNA-seq
  - extremely low-coverage datasets
- Predictions represent **associations**, not direct causal regulation.
- Experimental validation is required before biological conclusions.

---

## License

This model is released under the **MIT License**.

You are free to use, modify and redistribute it, provided that the license and copyright notice are preserved.

---

## Citation

If you use DeepRBP in your work, please cite:

DeepRBP: A deep neural network for inferring splicing regulation  
bioRxiv (2024)  
https://doi.org/10.1101/2024.04.11.589004