--- datasets: - multimolecule/gencode library_name: multimolecule license: agpl-3.0 pipeline: splice-site pipeline_tag: other tags: - Biology - Genomics - RNA - Splicing - rna widget: - example_title: microRNA 21 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UAGCUUAUCAGACUGAUGUUGA - example_title: microRNA 146a pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UGAGAACUGAAUUCCAUGGGUU - example_title: microRNA 155 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: UUAAUGCUAAUCGUGAUAGGGGUU - example_title: RNA component of mitochondrial RNA processing endoribonuclease pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGUUCGUGCUGAAGGCCUGUAUCCUAGGCUACACACUGAGGACUCUGUUCCUCCCCUUUCCGCCUAGGGGAAAGUCCCCGGACCUCGGGCAGAGAGUGCCACGUGCAUACGCACGUAGACAUUCCCCGCUUCCCACUCCAAAGUCCGCCAAGAAGCGUAUCCCGCUGAGCGGCGUGGCGCGGGGGCGUCAUCCGUCAGCUCCCUCUAGUUACGCAGGCAGUGCGUGUCCGCGCACCAACCACACGGGGCUCAUUCUCAGCGCGGCUGUAAAAAAAAA - example_title: 7SK small nuclear RNA pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGAUGUGAGGGCGAUCUGGCUGCGACAUCUGUCACCCCAUUGAUCGCCAGGGUUGAUUCGGCUGAUCUGGCUGGCUAGGCGGGUGUCCCCUUCCUCCCUCACCGCUCCAUGUGCGUCCCUCCCGAAGCUGCGCGCUCGGUCGAAGAGGACGACCAUCCCCGAUAGAGGAGGACCGGUCUUCGGUCAAGGGUAUACGAGUAGCUGCGCUCCCCUGCUAGAACCUCCAAACAAGCUCUCAAGGUCCAUUUGUAGGAGAACGUAGGGUAGUCAAGCUUCCAAGACUCCAGACACAUCCAAAUGAGGCGCUGCAUGUGGCAGUCUGCCUUUCUUUU - example_title: telomerase RNA component pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGGUUGCGGAGGGUGGGCCUGGGAGGGGUGGUGGCCAUUUUUUGUCUAACCCUAACUGAGAAGGGCGUAGGCGCCGUGCUUUUGCUCCCCGCGCGCUGUUUUUCUCGCUGACUUUCAGCGGGCGGAAAAGCCUCGGCCUGCCGCCUUCCACCGUUCAUUCUAGAGCAAACAAAAAAUGUCAGCUGCUGGCCCGUUCGCCCCUCCCGGGGACCUGCGGCGGGUCGCCUGCCCAGCCCCCGAACCCCGCCUGGAGGCCGCGGUCGGCCCGGGGCUUCUCCGGAGGCACCCACUGCCACCGCGAAGAGUUGGGCUCUGUCAGCCGCGGGUCUCUCGGGGGCGAGGGCGAGGUUCAGGCCUUUCAGGCCGCAGGAAGAGGAACGGAGCGAGUCCCCGCGCGCGGCGCGAUUCCCUGAGCUGUGGGACGUGCACCCAGGACUCGGCUCACACAUGC - example_title: vault RNA 2-1 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: CGGGUCGGAGUUAGCUCAAGCGGUUACCUCCUCAUGCCGGACUUUCUAUCUGUCCAUCUCUGUGCUGGGGUUCGAGACCCGCGGGUGCUUACUGACCCUUUUAUGCAA - example_title: brain cytoplasmic RNA 1 pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGCCGGGCGCGGUGGCUCACGCCUGUAAUCCCAGCUCUCAGGGAGGCUAAGAGGCGGGAGGAUAGCUUGAGCCCAGGAGUUCGAGACCUGCCUGGGCAAUAUAGCGAGACCCCGUUCUCCAGAAAAAGGAAAAAAAAAAACAAAAGACAAAAAAAAAAUAAGCGUAACUUCCCUCAAAGCAACAACCCCCCCCCCCCUUU - example_title: HIV-1 TAR-WT pipeline_tag: splice-site sequence_type: ncRNA task: splice-site text: GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGGAACC - example_title: prion protein (Kanno blood group) pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGCGAACCUUGGCUGCUGGAUGCUGGUUCUCUUUGUGGCCACAUGGAGUGACCUGGGCCUCUGC - example_title: interleukin 10 pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGCACAGCUCAGCACUGCUCUGUUGCCUGGUCCUCCUGACUGGGGUGAGGGCC - example_title: Zaire ebolavirus pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AAUGUUCAAACACUUUGUGAAGCUCUGUUAGCUGAUGGUCUUGCUAAAGCAUUUCCUAGCAAUAUGAUGGUAGUCACAGAGCGUGAGCAAAAAGAAAGCUUAUUGCAUCAAGCAUCAUGGCACCACACAAGUGAUGAUUUUGGUGAGCAUGCCACAGUUAGAGGGAGUAGCUUUGUAACUGAUUUAGAGAAAUACAAUCUUGCAUUUAGAUAUGAGUUUACAGCACCUUUUAUAGAAUAUUGUAACCGUUGCUAUGGUGUUAAGAAUGUUUUUAAUUGGAUGCAUUAUACAAUCCCACAGUGUUAU - example_title: SARS coronavirus pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGUUUAUUUUCUUAUUAUUUCUUACUCUCACUAGUGGUAGUGACCUUGACCGGUGCACCACUUUUGAUGAUGUUCAAGCUCCUAAUUACACUCAACAUACUUCAUCUAUGAGGGGGGUUUACUAUCCUGAUGAAAUUUUUAGAUCAGACACUCUUUAUUUAACUCAGGAUUUAUUUCUUCCAUUUUAUUCUAAUGUUACAGGGUUUCAUACUAUUAAUCAUACGUUUGACAACCCUGUCAUACCUUUUAAGGAUGGUAUUUAUUUUGCUGCCACAGAGAAAUCAAAUGUUGUCCGUGGUUGGGUUUUUGGUUCUACCAUGAACAACAAGUCACAGUCGGUGAUUAUUAUUAACAAUUCUACUAAUGUUGUUAUACGAGCAUGUAACUUUGAAUUGUGUGACAACCCUUUCUUUGCUGUUUCUAAACCCAUGGGUACACAGACACAUACUAUGAUAUUCGAUAAUGCAUUUAAAUGCACUUUCGAGUACAUAUCU - example_title: insulin pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGCCCUGUGGAUGCGCCUCCUGCCCCUGCUGGCGCUGCUGGCCCUCUGGGGACCUGACCCAGCCGCAGCCUUUGUGAACCAACACCUGUGCGGCUCACACCUGGUGGAAGCUCUCUACCUAGUGUGCGGGGAACGAGGCUUCUUCUACACACCCAAGACCCGCCGGGAGGCAGAGGACCUGCAGGUGGGGCAGGUGGAGCUGGGCGGGGGCCCUGGUGCAGGCAGCCUGCAGCCCUUGGCCCUGGAGGGGUCCCUGCAGAAGCGUGGCAUUGUGGAACAAUGCUGUACCAGCAUCUGCUCCCUCUACCAGCUGGAGAACUACUGCAACUAG - example_title: cyclin dependent kinase inhibitor 2A pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGGAGCCGGCGGCGGGGAGCAGCAUGGAGCCUUCGGCUGACUGGCUGGCCACGGCCGCGGCCCGGGGUCGGGUAGAGGAGGUGCGGGCGCUGCUGGAGGCGGGGGCGCUGCCCAACGCACCGAAUAGUUACGGUCGGAGGCCGAUCCAGGUCAUGAUGAUGGGCAGCGCCCGAGUGGCGGAGCUGCUGCUGCUCCACGGCGCGGAGCCCAACUGCGCCGACCCCGCCACUCUCACCCGACCCGUGCACGACGCUGCCCGGGAGGGCUUCCUGGACACGCUGGUGGUGCUGCACCGGGCCGGGGCGCGGCUGGACGUGCGCGAUGCCUGGGGCCGUCUGCCCGUGGACCUGGCUGAGGAGCUGGGCCAUCGCGAUGUCGCACGGUACCUGCGCGCGGCUGCGGGGGGCACCAGAGGCAGUAACCAUGCCCGCAUAGAUGCCGCGGAAGGUCCCUCAGACAUCCCCGAUUGA - example_title: human papillomavirus type 16 E6 pipeline_tag: splice-site sequence_type: mRNA task: splice-site text: AUGCACCAAAAGAGAACUGCAAUGUUUCAGGACCCACAGGAGCGACCCAGAAAGUUACCACAGUUAUGCACAGAGCUGCAAACAACUAUACAUGAUAUAAUAUUAGAAUGUGUGUACUGCAAGCAACAGUUACUGCGACGUGAGGUAUAUGACUUUGCUUUUCGGGAUUUAUGCAUAGUAUAUAGAGAUGGGAAUCCAUAUGCUGUAUGUGAUAAAUGUUUAAAGUUUUAUUCUAAAAUUAGUGAGUAUAGACAUUAUUGUUAUAGUUUGUAUGGAACAACAUUAGAACAGCAAUACAACAAACCGUUGUGUGAUUUGUUAAUUAGGUGUAUUAACUGUCAAAAGCCACUGUGUCCUGAAGAAAAGCAAAGACAUCUGGACAAAAAGCAAAGAUUCCAUAAUAUAAGGGGUCGGUGGACCGGUCGAUGUAUGUCUUGUUGCAGAUCAUCAAGAACACGUAGAGAAACCCAGCUGUAA - example_title: NRAS proto-oncogene pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: GGGGCCGGAAGUGCCGCUCCUUGGUGGGGGCUGUUCAUGGCGGUUCCGGGGUCUCCAACAUUUUUCCCGGCUGUGGUCCUAAAUCUGUCCAAAGCAGAGGCAGUGGAGCUUGAGGUUCUUGCUGGUGUGAA - example_title: amyloid beta precursor protein pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: GUCAGUUUCCUCGGCAGCGGUAGGCGAGAGCACGCGGAGGAGCGUGCGCGGGGGCCCCGGGAGACGGCGGCGGUGGCGGCGCGGGCAGAGCAAGGACGCGGCGGAUCCCACUCGCACAGCAGCGCACUCGGUGCCCCGCGCAGGGUCGCG - example_title: RUNX family transcription factor 1 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: ACUUCUUUGGGCCUCAUAAACAACCACAGAACCACAAGUUGGGUAGCCUGGCAGUGUCAGAAGUCUGAACCCAGCAUAGUGGUCAGCAGGCAGGACGAAUCACACUGAAUGCAAACCACAGGGUUUCGCAGCGUGGUAAAAGAAAUCAUUGAGUCCCCCGCCUUCAGAAGAGGGUGCAUUUUCAGGAGGAAGCG - example_title: fragile X messenger ribonucleoprotein 1 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: CUCAGUCAGGCGCUCAGCUCCGUUUCGGUUUCACUUCCGGUGGAGGGCCGCCUCUGAGCGGGCGGCGGGCCGACGGCGAGCGCGGGCGGCGGCGGUGACGGAGGCGCCGCUGCCAGGGGGCGUGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCUGGGCCUCGAGCGCCCGCAGCCCACCUCUCGGGGGCGGGCUCCCGGCGCUAGCAGGGCUGAAGAGAAG - example_title: MYC proto-oncogene pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: AACUCGCUGUAGUAAUUCCAGCGAGAGGCAGAGGGAGCGAGCGGGCGGCCGGCUAGGGUGGAAGAGCCGGGCGAGCAGAGCUGCGCUGCGGGCGUCCUGGGAAGGGAGAUCCGGAGCGAAUAGGGGGCUUCGCCUCUGGCCCAGCCCUCCCGCUGAUCCCCCAGCCAGCGGUCCGCAACCCUUGCCGCAUCCACGAAACUUUGCCCAUAGCAGCGGGCGGGCACUUUGCACUGGAACUUACAACACCCGAGCAAGGACGCGACUCUCCCGACGCGGGGAGGCUAUUCUGCCCAUUUGGGGACACUUCCCCGCCGCUGCCAGGACCCGCUUCUCUGAAAGGCUCUCCUUGCAGCUGCUUAGACG - example_title: activating transcription factor 4 pipeline_tag: splice-site sequence_type: 5' UTR task: splice-site text: CAUUUCUACUUUGCCCGCCCACAGAUGUAGUUUUCUCUGCGCGUGUGCGUUUUCCCUCCUCCCCGCCCUCAGGGUCCACGGCCACCAUGGCGUAUUAGGGGCAGCAGUGCCUGCGGCAGCAUUGGCCUUUGCAGCGGCGGCAGCAGCACCAGGCUCUGCAGCGGCAACCCCCAGCGGCUUAAGCCAUGGCGCUUCUCACGGCAUUCAGCAGCAGCGUUGCUGUAACCGACAAAGACACCUUCGAAUUAAGCACAUUCCUCGAUUCCAGCAAAGCACCGCAAC - example_title: Human GPI protein p137 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: UUUUUAAAAGGAAAAGAUACCAAAUGCCUGCUGCUACCACCCUUUUCAAUUGCUAUGUUUUGAAAGGCACCAGUAUGUGUUUUAGAUUGAUUUAAAUGUUUCAUUUAAAUCACGGACAGUAGUUUCAGUUCUGAUGGUAUAAGCAAAACAAAUAAAACGUUUAUAAAAGUUGUAUCUUGAAACACUGGUGUUCAACAGCUAGCAGCUUAUGUGAUUCACCCCAUGCCACGUUAGUGUCACAAAUUUUAUGGUUUAUCUCCAGCAACAUUUCUCUAGUACUUGCACUUAUUAUCUGAAUUC - example_title: nucleophosmin 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: GAAAAUAGUUUAAACAAUUUGUUAAAAAAUUUUCCGUCUUAUUUCAUUUCUGUAACAGUUGAUAUCUGGCUGUCCUUUUUAUAAUGCAGAGUGAGAACUUUCCCUACCGUGUUUGAUAAAUGUUGUCCAGGUUCUAUUGCCAAGAAUGUGUUGUCCAAAAUGCCUGUUUAGUUUUUAAAGAUGGAACUCCACCCUUUGCUUGGUUUUAAGUAUGUAUGGAAUGUUAUGAUAGGACAUAGUAGUAGCGGUGGUCAGACAUGGAAAUGGUGGGGAGACAAAAAUAUACAUGUGAAAUAAAACUCAGUAUUUUAAUAAAGUAGCACGGUUUCUAUUGA - example_title: superoxide dismutase 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: ACAUUCCCUUGGAUGUAGUCUGAGGCCCCUUAACUCAUCUGUUAUCCUGCUAGCUGUAGAAAUGUAUCCUGAUAAACAUUAAACACUGUAAUCUUAAAAGUGUAAUUGUGUGACUUUUUCAGAGUUGCUUUAAAGUACCUGUAGUGAGAAACUGAUUUAUGAUCACUUGGAAGAUUUGUAUAGUUUUAUAAAACUCAGUUAAAAUGUCUGUUUCAAUGACCUGUAUUUUGCCAGACUUAAAUCACAGAUGGGUAUUAAACUUGUCAGAAUUUCUUUGUCAUUCAAGCCUGUGAAUAAAAACCCUGUAUGGCACUUAUUAUGAGGCUAUUAAAAGAAUCCAAAUUCAAACUAAA - example_title: hemoglobin subunit alpha 2 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: CUGGAGCCUCGGUAGCCGUUCCUCCUGCCCGCUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUUGCACCGGCCCUUCCUGGUCUUUGAAUAAAGUCUGAGUGGGCAGCA - example_title: BRAF proto-oncogene pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: AACAAAUGAGUGAGAGAGUUCAGGAGAGUAGCAACAAAAGGAAAAUAAAUGAACAUAUGUUUGCUUAUAUGUUAAAUUGAAUAAAAUACUCUCUUUUUUUUUAAGGUGAACCAAAGAACACUUGUGUGGUUAAAGACUAGAUAUAAUUUUUCCCCAAACUAAAAUUUAUACUUAACAUUGGAUUUUUAACAUCCAAGGGUUAAAAUACAUAGACAUUGCUAAAAAUUGGCAGAGCCUCUUCUAGAGGCUUUACUUUCUGUUCCGGGUUUGUAUCAUUCACUUGGUUAUUUUAAGUAGUAAACUUCAGUUUCUCAUGCAACUUUUGUUGCCAGCUAUCACAUGUCCACUAGGGACUCCAGAAGAAGACCCUACCUAUGCCUGUGUUUGCAGGUGAGAAGUUGGCAGUCGGUUAGCCUGGG - example_title: H3 clustered histone 1 pipeline_tag: splice-site sequence_type: 3' UTR task: splice-site text: UUACUGUGGUCUCUCUGACGGUCCAAGCAAAGGCUCUUUUCAGAGCCACCACCUUUUC --- # OpenSpliceAI Modular native-PyTorch reimplementation of SpliceAI for predicting pre-mRNA splice sites from nucleotide sequence. ## Disclaimer This is an UNOFFICIAL implementation of [OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species](https://doi.org/10.7554/eLife.107454.3) by Kuan-Hao Chao, Alan Mao, et al. The OFFICIAL repository of OpenSpliceAI is at [Kuanhao-Chao/OpenSpliceAI](https://github.com/Kuanhao-Chao/OpenSpliceAI). > [!TIP] > The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation. **The team releasing OpenSpliceAI did not write this model card for this model so this model card has been written by the MultiMolecule team.** ## Model Details OpenSpliceAI is a deep dilated residual convolutional neural network that reimplements the SpliceAI architecture in native PyTorch. It predicts, for each nucleotide of a pre-mRNA transcript, whether the position is a splice acceptor, a splice donor, or neither. The model stacks dilated residual units with increasing kernel size and atrous rate so that a wide genomic context window contributes to each per-nucleotide prediction, while skip connections aggregate multi-scale features. OpenSpliceAI reproduces the predictive behavior of SpliceAI while providing an efficient, modular training pipeline that can be retrained on non-human species. ### Variants OpenSpliceAI ships trained model families for human MANE and four non-human species. Each family provides four flanking-context sizes. | Family | 80 nt | 400 nt | 2,000 nt | 10,000 nt | | ------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | | MANE / human | [`openspliceai-mane.80`](https://huggingface.co/multimolecule/openspliceai-mane.80) | [`openspliceai-mane.400`](https://huggingface.co/multimolecule/openspliceai-mane.400) | [`openspliceai-mane.2000`](https://huggingface.co/multimolecule/openspliceai-mane.2000) | [`openspliceai-mane.10000`](https://huggingface.co/multimolecule/openspliceai-mane.10000) | | Mouse | [`openspliceai-mouse.80`](https://huggingface.co/multimolecule/openspliceai-mouse.80) | [`openspliceai-mouse.400`](https://huggingface.co/multimolecule/openspliceai-mouse.400) | [`openspliceai-mouse.2000`](https://huggingface.co/multimolecule/openspliceai-mouse.2000) | [`openspliceai-mouse.10000`](https://huggingface.co/multimolecule/openspliceai-mouse.10000) | | Zebrafish | [`openspliceai-zebrafish.80`](https://huggingface.co/multimolecule/openspliceai-zebrafish.80) | [`openspliceai-zebrafish.400`](https://huggingface.co/multimolecule/openspliceai-zebrafish.400) | [`openspliceai-zebrafish.2000`](https://huggingface.co/multimolecule/openspliceai-zebrafish.2000) | [`openspliceai-zebrafish.10000`](https://huggingface.co/multimolecule/openspliceai-zebrafish.10000) | | Honeybee | [`openspliceai-honeybee.80`](https://huggingface.co/multimolecule/openspliceai-honeybee.80) | [`openspliceai-honeybee.400`](https://huggingface.co/multimolecule/openspliceai-honeybee.400) | [`openspliceai-honeybee.2000`](https://huggingface.co/multimolecule/openspliceai-honeybee.2000) | [`openspliceai-honeybee.10000`](https://huggingface.co/multimolecule/openspliceai-honeybee.10000) | | _Arabidopsis_ | [`openspliceai-arabidopsis.80`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.80) | [`openspliceai-arabidopsis.400`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.400) | [`openspliceai-arabidopsis.2000`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.2000) | [`openspliceai-arabidopsis.10000`](https://huggingface.co/multimolecule/openspliceai-arabidopsis.10000) | ### Model Specification
| Flanking Context | Residual Blocks | Hidden Size | Num Parameters (M) | FLOPs (G) | MACs (G) |
|---|---|---|---|---|---|
| 80 nt | 4 | 32 | 0.09 | 0.95 | 0.47 |
| 400 nt | 8 | 0.19 | 2.00 | 0.99 | |
| 2,000 nt | 12 | 0.36 | 5.03 | 2.50 | |
| 10,000 nt | 16 | 0.70 | 20.90 | 10.40 |