File size: 7,204 Bytes
95e471a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
language: dna
tags:
  - Biology
  - DNA
license: agpl-3.0
library_name: multimolecule
---

# MTSplice

Tissue-specific modeling of the effects of genetic variants on splicing.

## Disclaimer

This is an UNOFFICIAL implementation of the [MTSplice predicts effects of genetic variants on tissue-specific splicing](https://doi.org/10.1186/s13059-021-02273-7) by Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje and Julien Gagneur.

The OFFICIAL repository of MTSplice is at [gagneurlab/MMSplice_MTSplice](https://github.com/gagneurlab/MMSplice_MTSplice).

> [!TIP]
> The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

**The team releasing MTSplice did not write this model card for this model so this model card has been written by the MultiMolecule team.**

## Model Details

MTSplice is the tissue-specific second generation of MMSplice. It predicts the effect of genetic variants on cassette-exon splicing across 56 GTEx tissues. The cassette exon together with its flanking introns is fed into two parallel sequence towers:

- `acceptor`: a tower over the upstream region (intron overhang plus exon flank) around the 3' splice site.
- `donor`: a tower over the downstream region (exon flank plus intron overhang) around the 5' splice site.

Each tower applies a stem convolution followed by a stack of residual dilated-convolution blocks with an exponentially growing receptive field, then re-weights the per-position features with a positional B-spline transformation. The two towers are concatenated along the length axis, average-pooled, and combined by a small dense head into a per-tissue delta-logit-PSI splicing-effect vector. Please refer to the [Training Details](#training-details) section for more information on the training process.

Upstream MTSplice is distributed as a deep four-member ensemble (`mtsplice_deep0..3`) and an earlier eight-member ensemble (`mtsplice0..7`). MultiMolecule exposes the default deep-family architecture and converts one ensemble member (`mtsplice_deep0`) into a single deterministic checkpoint.

### Variant Effect Interface

MTSplice exposes variant effects as an input-schema concern, not a separate output type:

- Reference-only call (`input_ids` / `inputs_embeds`): returns the per-tissue score vector `logits` of shape `(batch_size, 56)`.
- Reference + alternative call (also pass `alternative_input_ids` / `alternative_inputs_embeds`): additionally returns `alternative_logits` and the per-tissue deltas `delta_logits` (`alternative_logits - logits`).
- `MTSpliceForSequencePrediction` returns the per-tissue deltas (or the per-tissue scores when no alternative is supplied) and applies the standard regression loss when labels are provided.

### Model Specification

| Num Blocks | Hidden Size | Num Tissues | Num Parameters | FLOPs (M) | MACs (M) |
| ---------- | ----------- | ----------- | -------------- | --------- | -------- |
| 8          | 64          | 56          | 210,840        | 164.36    | 80.90    |

(Num Blocks is per tower; FLOPs and MACs measured on an 800 bp cassette-exon-with-flanks input.)

### Links

- **Code**: [multimolecule.mtsplice](https://github.com/DLS5-Omics/multimolecule/tree/master/multimolecule/models/mtsplice)
- **Paper**: [MTSplice predicts effects of genetic variants on tissue-specific splicing](https://doi.org/10.1186/s13059-021-02273-7)
- **Developed by**: Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje, Julien Gagneur
- **Original Repository**: [gagneurlab/MMSplice_MTSplice](https://github.com/gagneurlab/MMSplice_MTSplice)

## Usage

The model file depends on the [`multimolecule`](https://multimolecule.danling.org) library. You can install it using pip:

```bash
pip install multimolecule
```

### Direct Use

#### Tissue Scores

```python
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceModel

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceModel.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> output = model(**reference)
>>> output["logits"].shape
torch.Size([1, 56])
```

#### Variant Effect

```python
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceForSequencePrediction

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceForSequencePrediction.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> alternative = tokenizer("agcagtcattatggctaatctggcaagta", return_tensors="pt")
>>> output = model(
...     reference["input_ids"],
...     alternative_input_ids=alternative["input_ids"],
... )
>>> output["logits"].shape
torch.Size([1, 56])
```

## Training Details

MTSplice was trained to predict tissue-specific percent-spliced-in (PSI) of cassette exons across GTEx tissues, building on the MMSplice modular splicing model with an added tissue-specific neural module.

### Training Data

MTSplice was trained on cassette-exon PSI quantifications across 56 GTEx tissues, together with the human reference splice-site and exon sequence context. The variant-effect predictions were validated against tissue-specific splicing quantitative trait loci (sQTL) and MPRA exon-skipping data.

### Training Procedure

#### Pre-training

The two sequence towers consume one-hot encoded DNA. A dilated-convolution stack with positional B-spline re-weighting extracts splicing features, which a dense head maps to per-tissue delta-logit-PSI. The tissue-resolved predictions are formed from the reference/alternative score deltas.

## Citation

```bibtex
@article{cheng2021mtsplice,
  title     = {MTSplice predicts effects of genetic variants on tissue-specific splicing},
  author    = {Cheng, Jun and {\c{C}}elik, Muhammed Hasan and Kundaje, Anshul and Gagneur, Julien},
  journal   = {Genome Biology},
  volume    = 22,
  number    = 1,
  pages     = {94},
  year      = 2021,
  publisher = {Springer},
  doi       = {10.1186/s13059-021-02273-7}
}
```

> [!NOTE]
> The artifacts distributed in this repository are part of the MultiMolecule project.
> If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

```bibtex
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}
```

## Contact

Please use GitHub issues of [MultiMolecule](https://github.com/DLS5-Omics/multimolecule/issues) for any questions or comments on the model card.

Please contact the authors of the [MTSplice paper](https://doi.org/10.1186/s13059-021-02273-7) for questions or comments on the paper/model.

## License

This model implementation is licensed under the [GNU Affero General Public License](license.md).

For additional terms and clarifications, please refer to our [License FAQ](license-faq.md).

```spdx
SPDX-License-Identifier: AGPL-3.0-or-later
```