MTSplice
Tissue-specific modeling of the effects of genetic variants on splicing.
Disclaimer
This is an UNOFFICIAL implementation of the MTSplice predicts effects of genetic variants on tissue-specific splicing by Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje and Julien Gagneur.
The OFFICIAL repository of MTSplice is at gagneurlab/MMSplice_MTSplice.
The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.
The team releasing MTSplice did not write this model card for this model so this model card has been written by the MultiMolecule team.
Model Details
MTSplice is the tissue-specific second generation of MMSplice. It predicts the effect of genetic variants on cassette-exon splicing across 56 GTEx tissues. The cassette exon together with its flanking introns is fed into two parallel sequence towers:
acceptor: a tower over the upstream region (intron overhang plus exon flank) around the 3' splice site.donor: a tower over the downstream region (exon flank plus intron overhang) around the 5' splice site.
Each tower applies a stem convolution followed by a stack of residual dilated-convolution blocks with an exponentially growing receptive field, then re-weights the per-position features with a positional B-spline transformation. The two towers are concatenated along the length axis, average-pooled, and combined by a small dense head into a per-tissue delta-logit-PSI splicing-effect vector. Please refer to the Training Details section for more information on the training process.
Upstream MTSplice is distributed as a deep four-member ensemble (mtsplice_deep0..3) and an earlier eight-member ensemble (mtsplice0..7). MultiMolecule exposes the default deep-family architecture and converts one ensemble member (mtsplice_deep0) into a single deterministic checkpoint.
Variant Effect Interface
MTSplice exposes variant effects as an input-schema concern, not a separate output type:
- Reference-only call (
input_ids/inputs_embeds): returns the per-tissue score vectorlogitsof shape(batch_size, 56). - Reference + alternative call (also pass
alternative_input_ids/alternative_inputs_embeds): additionally returnsalternative_logitsand the per-tissue deltasdelta_logits(alternative_logits - logits). MTSpliceForSequencePredictionreturns the per-tissue deltas (or the per-tissue scores when no alternative is supplied) and applies the standard regression loss when labels are provided.
Model Specification
| Num Blocks | Hidden Size | Num Tissues | Num Parameters | FLOPs (M) | MACs (M) |
|---|---|---|---|---|---|
| 8 | 64 | 56 | 210,840 | 164.36 | 80.90 |
(Num Blocks is per tower; FLOPs and MACs measured on an 800 bp cassette-exon-with-flanks input.)
Links
- Code: multimolecule.mtsplice
- Paper: MTSplice predicts effects of genetic variants on tissue-specific splicing
- Developed by: Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje, Julien Gagneur
- Original Repository: gagneurlab/MMSplice_MTSplice
Usage
The model file depends on the multimolecule library. You can install it using pip:
pip install multimolecule
Direct Use
Tissue Scores
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceModel
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceModel.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> output = model(**reference)
>>> output["logits"].shape
torch.Size([1, 56])
Variant Effect
>>> import torch
>>> from multimolecule import DnaTokenizer, MTSpliceForSequencePrediction
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MTSpliceForSequencePrediction.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("agcagtcattatggcgaatctggcaagta", return_tensors="pt")
>>> alternative = tokenizer("agcagtcattatggctaatctggcaagta", return_tensors="pt")
>>> output = model(
... reference["input_ids"],
... alternative_input_ids=alternative["input_ids"],
... )
>>> output["logits"].shape
torch.Size([1, 56])
Training Details
MTSplice was trained to predict tissue-specific percent-spliced-in (PSI) of cassette exons across GTEx tissues, building on the MMSplice modular splicing model with an added tissue-specific neural module.
Training Data
MTSplice was trained on cassette-exon PSI quantifications across 56 GTEx tissues, together with the human reference splice-site and exon sequence context. The variant-effect predictions were validated against tissue-specific splicing quantitative trait loci (sQTL) and MPRA exon-skipping data.
Training Procedure
Pre-training
The two sequence towers consume one-hot encoded DNA. A dilated-convolution stack with positional B-spline re-weighting extracts splicing features, which a dense head maps to per-tissue delta-logit-PSI. The tissue-resolved predictions are formed from the reference/alternative score deltas.
Citation
@article{cheng2021mtsplice,
title = {MTSplice predicts effects of genetic variants on tissue-specific splicing},
author = {Cheng, Jun and {\c{C}}elik, Muhammed Hasan and Kundaje, Anshul and Gagneur, Julien},
journal = {Genome Biology},
volume = 22,
number = 1,
pages = {94},
year = 2021,
publisher = {Springer},
doi = {10.1186/s13059-021-02273-7}
}
The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:
@software{chen_2024_12638419,
author = {Chen, Zhiyuan and Zhu, Sophia Y.},
title = {MultiMolecule},
doi = {10.5281/zenodo.12638419},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.12638419},
year = 2024,
month = may,
day = 4
}
Contact
Please use GitHub issues of MultiMolecule for any questions or comments on the model card.
Please contact the authors of the MTSplice paper for questions or comments on the paper/model.
License
This model implementation is licensed under the GNU Affero General Public License.
For additional terms and clarifications, please refer to our License FAQ.
SPDX-License-Identifier: AGPL-3.0-or-later
- Downloads last month
- 17