--- datasets: - multimolecule/gencode library_name: multimolecule license: agpl-3.0 pipeline: splice-site pipeline_tag: other tags: - Biology - Genomics - RNA - Splicing - rna widget: - example_title: microRNA 21 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UAGCUUAUCAGACUGAUGUUGA - example_title: microRNA 146a pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UGAGAACUGAAUUCCAUGGGUU - example_title: microRNA 155 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UUAAUGCUAAUCGUGAUAGGGGUU - example_title: RNA component of mitochondrial RNA processing endoribonuclease pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGUUCGUGCUGAAGGCCUGUAUCCUAGGCUACACACUGAGGACUCUGUUCCUCCCCUUUCCGCCUAGGGGAAAGUCCCCGGACCUCGGGCAGAGAGUGCCACGUGCAUACGCACGUAGACAUUCCCCGCUUCCCACUCCAAAGUCCGCCAAGAAGCGUAUCCCGCUGAGCGGCGUGGCGCGGGGGCGUCAUCCGUCAGCUCCCUCUAGUUACGCAGGCAGUGCGUGUCCGCGCACCAACCACACGGGGCUCAUUCUCAGCGCGGCUGUAAAAAAAAA - example_title: 7SK small nuclear RNA pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGAUGUGAGGGCGAUCUGGCUGCGACAUCUGUCACCCCAUUGAUCGCCAGGGUUGAUUCGGCUGAUCUGGCUGGCUAGGCGGGUGUCCCCUUCCUCCCUCACCGCUCCAUGUGCGUCCCUCCCGAAGCUGCGCGCUCGGUCGAAGAGGACGACCAUCCCCGAUAGAGGAGGACCGGUCUUCGGUCAAGGGUAUACGAGUAGCUGCGCUCCCCUGCUAGAACCUCCAAACAAGCUCUCAAGGUCCAUUUGUAGGAGAACGUAGGGUAGUCAAGCUUCCAAGACUCCAGACACAUCCAAAUGAGGCGCUGCAUGUGGCAGUCUGCCUUUCUUUU - example_title: telomerase RNA component pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGGUUGCGGAGGGUGGGCCUGGGAGGGGUGGUGGCCAUUUUUUGUCUAACCCUAACUGAGAAGGGCGUAGGCGCCGUGCUUUUGCUCCCCGCGCGCUGUUUUUCUCGCUGACUUUCAGCGGGCGGAAAAGCCUCGGCCUGCCGCCUUCCACCGUUCAUUCUAGAGCAAACAAAAAAUGUCAGCUGCUGGCCCGUUCGCCCCUCCCGGGGACCUGCGGCGGGUCGCCUGCCCAGCCCCCGAACCCCGCCUGGAGGCCGCGGUCGGCCCGGGGCUUCUCCGGAGGCACCCACUGCCACCGCGAAGAGUUGGGCUCUGUCAGCCGCGGGUCUCUCGGGGGCGAGGGCGAGGUUCAGGCCUUUCAGGCCGCAGGAAGAGGAACGGAGCGAGUCCCCGCGCGCGGCGCGAUUCCCUGAGCUGUGGGACGUGCACCCAGGACUCGGCUCACACAUGC - example_title: vault RNA 2-1 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: CGGGUCGGAGUUAGCUCAAGCGGUUACCUCCUCAUGCCGGACUUUCUAUCUGUCCAUCUCUGUGCUGGGGUUCGAGACCCGCGGGUGCUUACUGACCCUUUUAUGCAA - example_title: brain cytoplasmic RNA 1 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGCCGGGCGCGGUGGCUCACGCCUGUAAUCCCAGCUCUCAGGGAGGCUAAGAGGCGGGAGGAUAGCUUGAGCCCAGGAGUUCGAGACCUGCCUGGGCAAUAUAGCGAGACCCCGUUCUCCAGAAAAAGGAAAAAAAAAAACAAAAGACAAAAAAAAAAUAAGCGUAACUUCCCUCAAAGCAACAACCCCCCCCCCCCUUU - example_title: HIV-1 TAR-WT pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGGAACC - example_title: prion protein (Kanno blood group) pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGCGAACCUUGGCUGCUGGAUGCUGGUUCUCUUUGUGGCCACAUGGAGUGACCUGGGCCUCUGC - example_title: interleukin 10 pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGCACAGCUCAGCACUGCUCUGUUGCCUGGUCCUCCUGACUGGGGUGAGGGCC - example_title: Zaire ebolavirus pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AAUGUUCAAACACUUUGUGAAGCUCUGUUAGCUGAUGGUCUUGCUAAAGCAUUUCCUAGCAAUAUGAUGGUAGUCACAGAGCGUGAGCAAAAAGAAAGCUUAUUGCAUCAAGCAUCAUGGCACCACACAAGUGAUGAUUUUGGUGAGCAUGCCACAGUUAGAGGGAGUAGCUUUGUAACUGAUUUAGAGAAAUACAAUCUUGCAUUUAGAUAUGAGUUUACAGCACCUUUUAUAGAAUAUUGUAACCGUUGCUAUGGUGUUAAGAAUGUUUUUAAUUGGAUGCAUUAUACAAUCCCACAGUGUUAU - example_title: SARS coronavirus pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGUUUAUUUUCUUAUUAUUUCUUACUCUCACUAGUGGUAGUGACCUUGACCGGUGCACCACUUUUGAUGAUGUUCAAGCUCCUAAUUACACUCAACAUACUUCAUCUAUGAGGGGGGUUUACUAUCCUGAUGAAAUUUUUAGAUCAGACACUCUUUAUUUAACUCAGGAUUUAUUUCUUCCAUUUUAUUCUAAUGUUACAGGGUUUCAUACUAUUAAUCAUACGUUUGACAACCCUGUCAUACCUUUUAAGGAUGGUAUUUAUUUUGCUGCCACAGAGAAAUCAAAUGUUGUCCGUGGUUGGGUUUUUGGUUCUACCAUGAACAACAAGUCACAGUCGGUGAUUAUUAUUAACAAUUCUACUAAUGUUGUUAUACGAGCAUGUAACUUUGAAUUGUGUGACAACCCUUUCUUUGCUGUUUCUAAACCCAUGGGUACACAGACACAUACUAUGAUAUUCGAUAAUGCAUUUAAAUGCACUUUCGAGUACAUAUCU - example_title: insulin pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGCCCUGUGGAUGCGCCUCCUGCCCCUGCUGGCGCUGCUGGCCCUCUGGGGACCUGACCCAGCCGCAGCCUUUGUGAACCAACACCUGUGCGGCUCACACCUGGUGGAAGCUCUCUACCUAGUGUGCGGGGAACGAGGCUUCUUCUACACACCCAAGACCCGCCGGGAGGCAGAGGACCUGCAGGUGGGGCAGGUGGAGCUGGGCGGGGGCCCUGGUGCAGGCAGCCUGCAGCCCUUGGCCCUGGAGGGGUCCCUGCAGAAGCGUGGCAUUGUGGAACAAUGCUGUACCAGCAUCUGCUCCCUCUACCAGCUGGAGAACUACUGCAACUAG - example_title: cyclin dependent kinase inhibitor 2A pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGAGCCGGCGGCGGGGAGCAGCAUGGAGCCUUCGGCUGACUGGCUGGCCACGGCCGCGGCCCGGGGUCGGGUAGAGGAGGUGCGGGCGCUGCUGGAGGCGGGGGCGCUGCCCAACGCACCGAAUAGUUACGGUCGGAGGCCGAUCCAGGUCAUGAUGAUGGGCAGCGCCCGAGUGGCGGAGCUGCUGCUGCUCCACGGCGCGGAGCCCAACUGCGCCGACCCCGCCACUCUCACCCGACCCGUGCACGACGCUGCCCGGGAGGGCUUCCUGGACACGCUGGUGGUGCUGCACCGGGCCGGGGCGCGGCUGGACGUGCGCGAUGCCUGGGGCCGUCUGCCCGUGGACCUGGCUGAGGAGCUGGGCCAUCGCGAUGUCGCACGGUACCUGCGCGCGGCUGCGGGGGGCACCAGAGGCAGUAACCAUGCCCGCAUAGAUGCCGCGGAAGGUCCCUCAGACAUCCCCGAUUGA - example_title: human papillomavirus type 16 E6 pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGCACCAAAAGAGAACUGCAAUGUUUCAGGACCCACAGGAGCGACCCAGAAAGUUACCACAGUUAUGCACAGAGCUGCAAACAACUAUACAUGAUAUAAUAUUAGAAUGUGUGUACUGCAAGCAACAGUUACUGCGACGUGAGGUAUAUGACUUUGCUUUUCGGGAUUUAUGCAUAGUAUAUAGAGAUGGGAAUCCAUAUGCUGUAUGUGAUAAAUGUUUAAAGUUUUAUUCUAAAAUUAGUGAGUAUAGACAUUAUUGUUAUAGUUUGUAUGGAACAACAUUAGAACAGCAAUACAACAAACCGUUGUGUGAUUUGUUAAUUAGGUGUAUUAACUGUCAAAAGCCACUGUGUCCUGAAGAAAAGCAAAGACAUCUGGACAAAAAGCAAAGAUUCCAUAAUAUAAGGGGUCGGUGGACCGGUCGAUGUAUGUCUUGUUGCAGAUCAUCAAGAACACGUAGAGAAACCCAGCUGUAA - example_title: NRAS proto-oncogene pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: GGGGCCGGAAGUGCCGCUCCUUGGUGGGGGCUGUUCAUGGCGGUUCCGGGGUCUCCAACAUUUUUCCCGGCUGUGGUCCUAAAUCUGUCCAAAGCAGAGGCAGUGGAGCUUGAGGUUCUUGCUGGUGUGAA - example_title: amyloid beta precursor protein pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: GUCAGUUUCCUCGGCAGCGGUAGGCGAGAGCACGCGGAGGAGCGUGCGCGGGGGCCCCGGGAGACGGCGGCGGUGGCGGCGCGGGCAGAGCAAGGACGCGGCGGAUCCCACUCGCACAGCAGCGCACUCGGUGCCCCGCGCAGGGUCGCG - example_title: RUNX family transcription factor 1 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: ACUUCUUUGGGCCUCAUAAACAACCACAGAACCACAAGUUGGGUAGCCUGGCAGUGUCAGAAGUCUGAACCCAGCAUAGUGGUCAGCAGGCAGGACGAAUCACACUGAAUGCAAACCACAGGGUUUCGCAGCGUGGUAAAAGAAAUCAUUGAGUCCCCCGCCUUCAGAAGAGGGUGCAUUUUCAGGAGGAAGCG - example_title: fragile X messenger ribonucleoprotein 1 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: CUCAGUCAGGCGCUCAGCUCCGUUUCGGUUUCACUUCCGGUGGAGGGCCGCCUCUGAGCGGGCGGCGGGCCGACGGCGAGCGCGGGCGGCGGCGGUGACGGAGGCGCCGCUGCCAGGGGGCGUGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCUGGGCCUCGAGCGCCCGCAGCCCACCUCUCGGGGGCGGGCUCCCGGCGCUAGCAGGGCUGAAGAGAAG - example_title: MYC proto-oncogene pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: AACUCGCUGUAGUAAUUCCAGCGAGAGGCAGAGGGAGCGAGCGGGCGGCCGGCUAGGGUGGAAGAGCCGGGCGAGCAGAGCUGCGCUGCGGGCGUCCUGGGAAGGGAGAUCCGGAGCGAAUAGGGGGCUUCGCCUCUGGCCCAGCCCUCCCGCUGAUCCCCCAGCCAGCGGUCCGCAACCCUUGCCGCAUCCACGAAACUUUGCCCAUAGCAGCGGGCGGGCACUUUGCACUGGAACUUACAACACCCGAGCAAGGACGCGACUCUCCCGACGCGGGGAGGCUAUUCUGCCCAUUUGGGGACACUUCCCCGCCGCUGCCAGGACCCGCUUCUCUGAAAGGCUCUCCUUGCAGCUGCUUAGACG - example_title: activating transcription factor 4 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: CAUUUCUACUUUGCCCGCCCACAGAUGUAGUUUUCUCUGCGCGUGUGCGUUUUCCCUCCUCCCCGCCCUCAGGGUCCACGGCCACCAUGGCGUAUUAGGGGCAGCAGUGCCUGCGGCAGCAUUGGCCUUUGCAGCGGCGGCAGCAGCACCAGGCUCUGCAGCGGCAACCCCCAGCGGCUUAAGCCAUGGCGCUUCUCACGGCAUUCAGCAGCAGCGUUGCUGUAACCGACAAAGACACCUUCGAAUUAAGCACAUUCCUCGAUUCCAGCAAAGCACCGCAAC - example_title: Human GPI protein p137 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: UUUUUAAAAGGAAAAGAUACCAAAUGCCUGCUGCUACCACCCUUUUCAAUUGCUAUGUUUUGAAAGGCACCAGUAUGUGUUUUAGAUUGAUUUAAAUGUUUCAUUUAAAUCACGGACAGUAGUUUCAGUUCUGAUGGUAUAAGCAAAACAAAUAAAACGUUUAUAAAAGUUGUAUCUUGAAACACUGGUGUUCAACAGCUAGCAGCUUAUGUGAUUCACCCCAUGCCACGUUAGUGUCACAAAUUUUAUGGUUUAUCUCCAGCAACAUUUCUCUAGUACUUGCACUUAUUAUCUGAAUUC - example_title: nucleophosmin 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: GAAAAUAGUUUAAACAAUUUGUUAAAAAAUUUUCCGUCUUAUUUCAUUUCUGUAACAGUUGAUAUCUGGCUGUCCUUUUUAUAAUGCAGAGUGAGAACUUUCCCUACCGUGUUUGAUAAAUGUUGUCCAGGUUCUAUUGCCAAGAAUGUGUUGUCCAAAAUGCCUGUUUAGUUUUUAAAGAUGGAACUCCACCCUUUGCUUGGUUUUAAGUAUGUAUGGAAUGUUAUGAUAGGACAUAGUAGUAGCGGUGGUCAGACAUGGAAAUGGUGGGGAGACAAAAAUAUACAUGUGAAAUAAAACUCAGUAUUUUAAUAAAGUAGCACGGUUUCUAUUGA - example_title: superoxide dismutase 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: ACAUUCCCUUGGAUGUAGUCUGAGGCCCCUUAACUCAUCUGUUAUCCUGCUAGCUGUAGAAAUGUAUCCUGAUAAACAUUAAACACUGUAAUCUUAAAAGUGUAAUUGUGUGACUUUUUCAGAGUUGCUUUAAAGUACCUGUAGUGAGAAACUGAUUUAUGAUCACUUGGAAGAUUUGUAUAGUUUUAUAAAACUCAGUUAAAAUGUCUGUUUCAAUGACCUGUAUUUUGCCAGACUUAAAUCACAGAUGGGUAUUAAACUUGUCAGAAUUUCUUUGUCAUUCAAGCCUGUGAAUAAAAACCCUGUAUGGCACUUAUUAUGAGGCUAUUAAAAGAAUCCAAAUUCAAACUAAA - example_title: hemoglobin subunit alpha 2 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: CUGGAGCCUCGGUAGCCGUUCCUCCUGCCCGCUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUUGCACCGGCCCUUCCUGGUCUUUGAAUAAAGUCUGAGUGGGCAGCA - example_title: BRAF proto-oncogene pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: AACAAAUGAGUGAGAGAGUUCAGGAGAGUAGCAACAAAAGGAAAAUAAAUGAACAUAUGUUUGCUUAUAUGUUAAAUUGAAUAAAAUACUCUCUUUUUUUUUAAGGUGAACCAAAGAACACUUGUGUGGUUAAAGACUAGAUAUAAUUUUUCCCCAAACUAAAAUUUAUACUUAACAUUGGAUUUUUAACAUCCAAGGGUUAAAAUACAUAGACAUUGCUAAAAAUUGGCAGAGCCUCUUCUAGAGGCUUUACUUUCUGUUCCGGGUUUGUAUCAUUCACUUGGUUAUUUUAAGUAGUAAACUUCAGUUUCUCAUGCAACUUUUGUUGCCAGCUAUCACAUGUCCACUAGGGACUCCAGAAGAAGACCCUACCUAUGCCUGUGUUUGCAGGUGAGAAGUUGGCAGUCGGUUAGCCUGGG - example_title: H3 clustered histone 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: UUACUGUGGUCUCUCUGACGGUCCAAGCAAAGGCUCUUUUCAGAGCCACCACCUUUUC --- # OpenSpliceAI Modular native-PyTorch reimplementation of SpliceAI for predicting pre-mRNA splice sites from nucleotide sequence. ## Disclaimer This is an UNOFFICIAL implementation of [OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species](https://doi.org/10.7554/eLife.107454.3) by Kuan-Hao Chao, Alan Mao, et al. The OFFICIAL repository of OpenSpliceAI is at [Kuanhao-Chao/OpenSpliceAI](https://github.com/Kuanhao-Chao/OpenSpliceAI). > [!TIP] > The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation. **The team releasing OpenSpliceAI did not write this model card for this model so this model card has been written by the MultiMolecule team.** ## Model Details OpenSpliceAI is a deep dilated residual convolutional neural network that reimplements the SpliceAI architecture in native PyTorch. It predicts, for each nucleotide of a pre-mRNA transcript, whether the position is a splice acceptor, a splice donor, or neither. The model stacks dilated residual units with increasing kernel size and atrous rate so that a wide genomic context window contributes to each per-nucleotide prediction, while skip connections aggregate multi-scale features. OpenSpliceAI reproduces the predictive behavior of SpliceAI while providing an efficient, modular training pipeline that can be retrained on non-human species. ### Variants OpenSpliceAI ships trained model families for human MANE and four non-human species. Each family provides four flanking-context sizes. | Family | 80 nt | 400 nt | 2,000 nt | 10,000 nt | | ------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | | MANE / human | [`openspliceai-mane.80`](https://huggingface.co/multimolecule/openspliceai-mane.80) | [`openspliceai-mane.400`](https://huggingface.co/multimolecule/openspliceai-mane.400) | [`openspliceai-mane.2000`](https://huggingface.co/multimolecule/openspliceai-mane.2000) | [`openspliceai-mane.10000`](https://huggingface.co/multimolecule/openspliceai-mane.10000) | | Mouse | [`openspliceai-mouse.80`](https://huggingface.co/multimolecule/openspliceai-mouse.80) | [`openspliceai-mouse.400`](https://huggingface.co/multimolecule/openspliceai-mouse.400) | [`openspliceai-mouse.2000`](https://huggingface.co/multimolecule/openspliceai-mouse.2000) | [`openspliceai-mouse.10000`](https://huggingface.co/multimolecule/openspliceai-mouse.10000) | | Zebrafish | [`openspliceai-zebrafish.80`](https://huggingface.co/multimolecule/openspliceai-zebrafish.80) | [`openspliceai-zebrafish.400`](https://huggingface.co/multimolecule/openspliceai-zebrafish.400) | [`openspliceai-zebrafish.2000`](https://huggingface.co/multimolecule/openspliceai-zebrafish.2000) | [`openspliceai-zebrafish.10000`](https://huggingface.co/multimolecule/openspliceai-zebrafish.10000) | | Honeybee | [`openspliceai-honeybee.80`](https://huggingface.co/multimolecule/openspliceai-honeybee.80) | [`openspliceai-honeybee.400`](https://huggingface.co/multimolecule/openspliceai-honeybee.400) | [`openspliceai-honeybee.2000`](https://huggingface.co/multimolecule/openspliceai-honeybee.2000) | [`openspliceai-honeybee.10000`](https://huggingface.co/multimolecule/openspliceai-honeybee.10000) | | _Arabidopsis_ | [`openspliceai-arabidopsis.80`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.80) | [`openspliceai-arabidopsis.400`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.400) | [`openspliceai-arabidopsis.2000`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.2000) | [`openspliceai-arabidopsis.10000`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.10000) | ### Model Specification
Flanking Context Residual Blocks Hidden Size Num Parameters (M) FLOPs (G) MACs (G)
80 nt 4 32 0.09 0.95 0.47
400 nt 8 0.19 2.00 0.99
2,000 nt 12 0.36 5.03 2.50
10,000 nt 16 0.70 20.90 10.40
Model size is determined by flanking context and is shared across species for the same context. FLOPs and MACs are reported for a single 5,000-nucleotide output sequence. ### Links - **Code**: [multimolecule.openspliceai](https://github.com/DLS5-Omics/multimolecule/tree/master/multimolecule/models/openspliceai) - **Data**: Human MANE/GENCODE for the MANE variants; species annotations follow the original OpenSpliceAI release. - **Paper**: [OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species](https://doi.org/10.7554/eLife.107454.3) - **Developed by**: Kuan-Hao Chao, Alan Mao, Anqi Liu, Steven L. Salzberg, Mihaela Pertea - **Model type**: Dilated residual 1D CNN over pre-mRNA sequence for per-nucleotide three-class splice-site classification - **Original Repository**: [Kuanhao-Chao/OpenSpliceAI](https://github.com/Kuanhao-Chao/OpenSpliceAI) ## Usage The model file depends on the [`multimolecule`](https://multimolecule.danling.org) library. You can install it using pip: ```bash pip install multimolecule ``` ### Direct Use #### RNA Splicing Site Prediction You can use this model directly to predict the splice sites of a pre-mRNA sequence: ```python >>> from multimolecule import RnaTokenizer, OpenSpliceAiForTokenPrediction >>> model_id = "multimolecule/openspliceai-mouse.2000" >>> tokenizer = RnaTokenizer.from_pretrained(model_id) >>> model = OpenSpliceAiForTokenPrediction.from_pretrained(model_id) >>> output = model(tokenizer("AGCAGUCAUUAUGGCGAA", return_tensors="pt")["input_ids"]) >>> output.keys() odict_keys(['logits']) ``` Each output position carries three logits corresponding to _neither_, _acceptor_, and _donor_. ### Interface - **Input length**: variable pre-mRNA sequence - **Flanking context**: 80 / 400 / 2,000 / 10,000 nt per variant family, split evenly on both sides of every predicted position - **Padding**: sequence ends padded with `N` - **Output**: per-position 3-class logits (`neither`, `acceptor`, `donor`) ## Training Details OpenSpliceAI was trained to predict the location of splice donor and acceptor sites from nucleotide sequence, following the SpliceAI training methodology. ### Training Data The MANE variants were trained on transcripts from the [GENCODE](https://multimolecule.danling.org/datasets/gencode)/MANE human reference annotation. The non-human variants use the species annotations released by OpenSpliceAI for mouse, zebrafish, honeybee, and _Arabidopsis_. For each predicted nucleotide, the model receives a flanking context of 80, 400, 2,000, or 10,000 nucleotides, split evenly across the two sides of the output sequence, with sequence ends padded with `N`. Annotated splice donor and acceptor sites serve as positive labels; all other positions are negative. ### Training Procedure #### Pre-training The model was trained to minimize a cross-entropy loss between predicted splice-site probabilities and the reference annotation. - Optimizer: Adam - Loss: cross-entropy Please refer to the [OpenSpliceAI paper](https://doi.org/10.7554/eLife.107454.3) for the full training protocol and hardware details. ## Citation ```bibtex @article{chao2025openspliceai, author = {Chao, Kuan-Hao and Mao, Alan and Liu, Anqi and Salzberg, Steven L and Pertea, Mihaela}, title = {OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species}, journal = {eLife}, volume = 14, pages = {RP107454}, year = 2025, doi = {10.7554/eLife.107454.3}, publisher = {eLife Sciences Publications, Ltd} } ``` > [!NOTE] > The artifacts distributed in this repository are part of the MultiMolecule project. > If MultiMolecule supports your research, please cite the MultiMolecule project as follows: ```bibtex @software{chen_2024_12638419, author = {Chen, Zhiyuan and Zhu, Sophia Y.}, title = {MultiMolecule}, doi = {10.5281/zenodo.12638419}, publisher = {Zenodo}, url = {https://doi.org/10.5281/zenodo.12638419}, year = 2024, month = may, day = 4 } ``` ## Contact Please use GitHub issues of [MultiMolecule](https://github.com/DLS5-Omics/multimolecule/issues) for any questions or comments on the model card. Please contact the authors of the [OpenSpliceAI paper](https://doi.org/10.7554/eLife.107454.3) for questions or comments on the paper/model. ## License This model implementation is licensed under the [GNU Affero General Public License](license.md). For additional terms and clarifications, please refer to our [License FAQ](license-faq.md). ```spdx SPDX-License-Identifier: AGPL-3.0-or-later ```