--- language: dna library_name: multimolecule license: agpl-3.0 pipeline: methylation pipeline_tag: other tags: - Biology - DNA widget: - example_title: tumor protein p53 pipeline_tag: methylation sequence_type: DNA task: methylation text: ACTCCCCTGCCCTCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCCTGTGCAGCTGTGGGTTGATTCCACACCCCCGCCCGGCACCCGCGTCCGCGCCATGGCCATCTACAAGCAGTCACAGCACATGACGGAGGTTGTGAGGCGCTGCCCCCACCATGAGCGCTGCTCAGATAGCGATGG - example_title: BRCA1 DNA repair associated pipeline_tag: methylation sequence_type: DNA task: methylation text: TCATTGGAACAGAAAGAAATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGG - example_title: hemoglobin subunit beta pipeline_tag: methylation sequence_type: DNA task: methylation text: CATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGG - example_title: CF transmembrane conductance regulator pipeline_tag: methylation sequence_type: DNA task: methylation text: ACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCACCATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCATCAAAGCATGCCAACTAGAAGAG - example_title: telomerase reverse transcriptase pipeline_tag: methylation sequence_type: DNA task: methylation text: CGCGGGGGTGGCCGGGGCCAGGGCTTCCCACGTGCGCAGCAGGACGCAGCGCTGCCTGAAACTCGCGCCGCGAGGAGAGGGCGGGGCCGCGGAAAGGAAGGGGAGGGGCTGGGAGGGCCCGGAGGGGGCTGGGCCGGGGACCCGGGAGGGGTCGGGACGGGGCGGGGTCCGCGCGGAGGAGGCGGAGCTGGAAGGTGAAGGGGCAGGACGGGTGCCCGGGTCCCCAGTCCCTCCGCCACGTGGGAAGCGCGGTCCTGGGCGTCTGTGCCCGCGAATCCACTGGGAGCCCGGCCTGGCCCCGACAGCGCAGCTGCTCCGGGCGGACCCGGGG - example_title: KRAS proto-oncogene pipeline_tag: methylation sequence_type: DNA task: methylation text: GCCTGCTGAAAATGACTGAATATAAACTTGTGGTAGTTGGAGCTGGTGGCGTAGGCAAGAGTGCCTTGACGATACAGCTAATTCAGAATCATTTTGTGGACGAATATGATCCAACAATAGAG - example_title: prion protein (Kanno blood group) pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGGCGAACCTTGGCTGCTGGATGCTGGTTCTCTTTGTGGCCACATGGAGTGACCTGGGCCTCTGC - example_title: interleukin 10 pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGCACAGCTCAGCACTGCTCTGTTGCCTGGTCCTCCTGACTGGGGTGAGGGCC - example_title: Zaire ebolavirus pipeline_tag: methylation sequence_type: cDNA task: methylation text: AATGTTCAAACACTTTGTGAAGCTCTGTTAGCTGATGGTCTTGCTAAAGCATTTCCTAGCAATATGATGGTAGTCACAGAGCGTGAGCAAAAAGAAAGCTTATTGCATCAAGCATCATGGCACCACACAAGTGATGATTTTGGTGAGCATGCCACAGTTAGAGGGAGTAGCTTTGTAACTGATTTAGAGAAATACAATCTTGCATTTAGATATGAGTTTACAGCACCTTTTATAGAATATTGTAACCGTTGCTATGGTGTTAAGAATGTTTTTAATTGGATGCATTATACAATCCCACAGTGTTAT - example_title: SARS coronavirus pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGTTTATTTTCTTATTATTTCTTACTCTCACTAGTGGTAGTGACCTTGACCGGTGCACCACTTTTGATGATGTTCAAGCTCCTAATTACACTCAACATACTTCATCTATGAGGGGGGTTTACTATCCTGATGAAATTTTTAGATCAGACACTCTTTATTTAACTCAGGATTTATTTCTTCCATTTTATTCTAATGTTACAGGGTTTCATACTATTAATCATACGTTTGACAACCCTGTCATACCTTTTAAGGATGGTATTTATTTTGCTGCCACAGAGAAATCAAATGTTGTCCGTGGTTGGGTTTTTGGTTCTACCATGAACAACAAGTCACAGTCGGTGATTATTATTAACAATTCTACTAATGTTGTTATACGAGCATGTAACTTTGAATTGTGTGACAACCCTTTCTTTGCTGTTTCTAAACCCATGGGTACACAGACACATACTATGATATTCGATAATGCATTTAAATGCACTTTCGAGTACATATCT - example_title: insulin pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG - example_title: cyclin dependent kinase inhibitor 2A pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGGAGCCGGCGGCGGGGAGCAGCATGGAGCCTTCGGCTGACTGGCTGGCCACGGCCGCGGCCCGGGGTCGGGTAGAGGAGGTGCGGGCGCTGCTGGAGGCGGGGGCGCTGCCCAACGCACCGAATAGTTACGGTCGGAGGCCGATCCAGGTCATGATGATGGGCAGCGCCCGAGTGGCGGAGCTGCTGCTGCTCCACGGCGCGGAGCCCAACTGCGCCGACCCCGCCACTCTCACCCGACCCGTGCACGACGCTGCCCGGGAGGGCTTCCTGGACACGCTGGTGGTGCTGCACCGGGCCGGGGCGCGGCTGGACGTGCGCGATGCCTGGGGCCGTCTGCCCGTGGACCTGGCTGAGGAGCTGGGCCATCGCGATGTCGCACGGTACCTGCGCGCGGCTGCGGGGGGCACCAGAGGCAGTAACCATGCCCGCATAGATGCCGCGGAAGGTCCCTCAGACATCCCCGATTGA - example_title: human papillomavirus type 16 E6 pipeline_tag: methylation sequence_type: cDNA task: methylation text: ATGCACCAAAAGAGAACTGCAATGTTTCAGGACCCACAGGAGCGACCCAGAAAGTTACCACAGTTATGCACAGAGCTGCAAACAACTATACATGATATAATATTAGAATGTGTGTACTGCAAGCAACAGTTACTGCGACGTGAGGTATATGACTTTGCTTTTCGGGATTTATGCATAGTATATAGAGATGGGAATCCATATGCTGTATGTGATAAATGTTTAAAGTTTTATTCTAAAATTAGTGAGTATAGACATTATTGTTATAGTTTGTATGGAACAACATTAGAACAGCAATACAACAAACCGTTGTGTGATTTGTTAATTAGGTGTATTAACTGTCAAAAGCCACTGTGTCCTGAAGAAAAGCAAAGACATCTGGACAAAAAGCAAAGATTCCATAATATAAGGGGTCGGTGGACCGGTCGATGTATGTCTTGTTGCAGATCATCAAGAACACGTAGAGAAACCCAGCTGTAA --- # DeepCpG-DNA DNA-only convolutional neural network from DeepCpG for predicting per-cell single-cell DNA methylation states from a CpG-centered sequence window. ## Disclaimer This is an UNOFFICIAL implementation of [DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning](https://doi.org/10.1186/s13059-017-1189-z) by Christof Angermueller, et al. The OFFICIAL repository of DeepCpG is at [cangermueller/deepcpg](https://github.com/cangermueller/deepcpg). > [!TIP] > The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation. **The team releasing DeepCpG-DNA did not write this model card for this model so this model card has been written by the MultiMolecule team.** ## Model Details DeepCpG-DNA is the DNA submodule of the DeepCpG joint model. It is a 1D convolutional neural network that predicts the per-cell methylation state of a CpG site from a fixed-length 1001 bp DNA window centered on the site. The model consumes a one-hot encoded sequence and applies `valid`-padded convolutional blocks (Conv1D + ReLU + MaxPool) followed by a dense bottleneck and one binary classification head per single cell in the training dataset. Please refer to the [Training Details](#training-details) section for more information on the training process. The full DeepCpG model combines this DNA submodule with a recurrent CpG-context submodule and a joint head; this model card covers the DNA submodule only. ### Variants The DeepCpG-DNA module is trained per single-cell dataset, so each variant predicts a different number of output cells. | Dataset | Architecture | Cells | Hub repository | | ------------------------- | ------------ | ----- | ------------------------------------------------------------------------------------------------------- | | Smallwood 2014 serum mESC | CnnL2h128 | 18 | [`deepcpgdna-smallwood2014-serum`](https://huggingface.co/multimolecule/deepcpgdna-smallwood2014-serum) | | Smallwood 2014 2i mESC | CnnL3h128 | 12 | [`deepcpgdna-smallwood2014-2i`](https://huggingface.co/multimolecule/deepcpgdna-smallwood2014-2i) | | Hou 2016 HCC | CnnL2h128 | 25 | [`deepcpgdna-hou2016-hcc`](https://huggingface.co/multimolecule/deepcpgdna-hou2016-hcc) | | Hou 2016 HepG2 | CnnL3h128 | 6 | [`deepcpgdna-hou2016-hepg2`](https://huggingface.co/multimolecule/deepcpgdna-hou2016-hepg2) | | Hou 2016 mESC | CnnL2h128 | 6 | [`deepcpgdna-hou2016-mesc`](https://huggingface.co/multimolecule/deepcpgdna-hou2016-mesc) | ### Model Specification
| Architecture | Num Conv Layers | Hidden Size | Num Cells | Num Parameters (M) | FLOPs (M) | MACs (M) | Max Num Tokens |
|---|---|---|---|---|---|---|---|
| CnnL2h128 | 2 | 128 | 18 | 4.11 | 70.63 | 35.06 | 1001 |
| CnnL3h128 | 3 | 12 | 4.43 | 165.02 | 82.18 |