File size: 7,204 Bytes
95e471a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | ---
language: dna
tags:
- Biology
- DNA
license: agpl-3.0
library_name: multimolecule
---
# MTSplice
Tissue-specific modeling of the effects of genetic variants on splicing.
## Disclaimer
This is an UNOFFICIAL implementation of the [MTSplice predicts effects of genetic variants on tissue-specific splicing](https://doi.org/10.1186/s13059-021-02273-7) by Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje and Julien Gagneur.
The OFFICIAL repository of MTSplice is at [gagneurlab/MMSplice_MTSplice](https://github.com/gagneurlab/MMSplice_MTSplice).
> [!TIP]
> The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.
**The team releasing MTSplice did not write this model card for this model so this model card has been written by the MultiMolecule team.**
## Model Details
MTSplice is the tissue-specific second generation of MMSplice. It predicts the effect of genetic variants on cassette-exon splicing across 56 GTEx tissues. The cassette exon together with its flanking introns is fed into two parallel sequence towers:
- `acceptor`: a tower over the upstream region (intron overhang plus exon flank) around the 3' splice site.
- `donor`: a tower over the downstream region (exon flank plus intron overhang) around the 5' splice site.
Each tower applies a stem convolution followed by a stack of residual dilated-convolution blocks with an exponentially growing receptive field, then re-weights the per-position features with a positional B-spline transformation. The two towers are concatenated along the length axis, average-pooled, and combined by a small dense head into a per-tissue delta-logit-PSI splicing-effect vector. Please refer to the [Training Details](#training-details) section for more information on the training process.
Upstream MTSplice is distributed as a deep four-member ensemble (`mtsplice_deep0..3`) and an earlier eight-member ensemble (`mtsplice0..7`). MultiMolecule exposes the default deep-family architecture and converts one ensemble member (`mtsplice_deep0`) into a single deterministic checkpoint.
### Variant Effect Interface
MTSplice exposes variant effects as an input-schema concern, not a separate output type:
- Reference-only call (`input_ids` / `inputs_embeds`): returns the per-tissue score vector `logits` of shape `(batch_size, 56)`.
- Reference + alternative call (also pass `alternative_input_ids` / `alternative_inputs_embeds`): additionally returns `alternative_logits` and the per-tissue deltas `delta_logits` (`alternative_logits - logits`).
- `MTSpliceForSequencePrediction` returns the per-tissue deltas (or the per-tissue scores when no alternative is supplied) and applies the standard regression loss when labels are provided.
### Model Specification
| Num Blocks | Hidden Size | Num Tissues | Num Parameters | FLOPs (M) | MACs (M) |
| ---------- | ----------- | ----------- | -------------- | --------- | -------- |
| 8 | 64 | 56 | 210,840 | 164.36 | 80.90 |
(Num Blocks is per tower; FLOPs and MACs measured on an 800 bp cassette-exon-with-flanks input.)
### Links
- **Code**: [multimolecule.mtsplice](https://github.com/DLS5-Omics/multimolecule/tree/master/multimolecule/models/mtsplice)
- **Paper**: [MTSplice predicts effects of genetic variants on tissue-specific splicing](https://doi.org/10.1186/s13059-021-02273-7)
- **Developed by**: Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje, Julien Gagneur
- **Original Repository**: [gagneurlab/MMSplice_MTSplice](https://github.com/gagneurlab/MMSplice_MTSplice)
## Usage
The model file depends on the [`multimolecule`](https://multimolecule.danling.org) library. You can install it using pip:
```bash
pip install multimolecule
```
### Direct Use
#### Tissue Scores
```python
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceModel
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceModel.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> output = model(**reference)
>>> output["logits"].shape
torch.Size([1, 56])
```
#### Variant Effect
```python
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceForSequencePrediction
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceForSequencePrediction.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> alternative = tokenizer("agcagtcattatggctaatctggcaagta", return_tensors="pt")
>>> output = model(
... reference["input_ids"],
... alternative_input_ids=alternative["input_ids"],
... )
>>> output["logits"].shape
torch.Size([1, 56])
```
## Training Details
MTSplice was trained to predict tissue-specific percent-spliced-in (PSI) of cassette exons across GTEx tissues, building on the MMSplice modular splicing model with an added tissue-specific neural module.
### Training Data
MTSplice was trained on cassette-exon PSI quantifications across 56 GTEx tissues, together with the human reference splice-site and exon sequence context. The variant-effect predictions were validated against tissue-specific splicing quantitative trait loci (sQTL) and MPRA exon-skipping data.
### Training Procedure
#### Pre-training
The two sequence towers consume one-hot encoded DNA. A dilated-convolution stack with positional B-spline re-weighting extracts splicing features, which a dense head maps to per-tissue delta-logit-PSI. The tissue-resolved predictions are formed from the reference/alternative score deltas.
## Citation
```bibtex
@article{cheng2021mtsplice,
title = {MTSplice predicts effects of genetic variants on tissue-specific splicing},
author = {Cheng, Jun and {\c{C}}elik, Muhammed Hasan and Kundaje, Anshul and Gagneur, Julien},
journal = {Genome Biology},
volume = 22,
number = 1,
pages = {94},
year = 2021,
publisher = {Springer},
doi = {10.1186/s13059-021-02273-7}
}
```
> [!NOTE]
> The artifacts distributed in this repository are part of the MultiMolecule project.
> If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:
```bibtex
@software{chen_2024_12638419,
author = {Chen, Zhiyuan and Zhu, Sophia Y.},
title = {MultiMolecule},
doi = {10.5281/zenodo.12638419},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.12638419},
year = 2024,
month = may,
day = 4
}
```
## Contact
Please use GitHub issues of [MultiMolecule](https://github.com/DLS5-Omics/multimolecule/issues) for any questions or comments on the model card.
Please contact the authors of the [MTSplice paper](https://doi.org/10.1186/s13059-021-02273-7) for questions or comments on the paper/model.
## License
This model implementation is licensed under the [GNU Affero General Public License](license.md).
For additional terms and clarifications, please refer to our [License FAQ](license-faq.md).
```spdx
SPDX-License-Identifier: AGPL-3.0-or-later
```
|