--- license: apache-2.0 tags: - medical - biology --- # eccDNAMamba **A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis** --- ### Model Overview **eccDNAMamba** is a **bidirectional state-space model (SSM)** designed for efficient and topology-aware modeling of **extrachromosomal circular DNA (eccDNA)**. By combining **forward and reverse Mamba-2 encoders**, **motif-level Byte Pair Encoding (BPE)**, and a lightweight **head–tail circular augmentation**, it captures wrap-around dependencies in ultra-long (10–200 kbp) genomic sequences while maintaining linear-time scalability. The model provides strong performance across cancer-associated eccDNA prediction, copy-number level estimation, and real vs. pseudo-eccDNA discrimination tasks. --- ### Quick Start ```python from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("eccdna/eccDNAMamba-1M") model = AutoModelForMaskedLM.from_pretrained("eccdna/eccDNAMamba-1M") sequence = "ATGCGTACGTTAGCGTACGT" inputs = tokenizer(sequence, return_tensors="pt") outputs = model(**inputs) # Access logits or reconstruct masked spans logits = outputs.logits ``` --- ### Citation ```python @inproceedings{ liu2025eccdnamamba, title={ecc{DNAM}amba: A Pre-Trained Model for Ultra-Long ecc{DNA} Sequence Analysis}, author={Zhenke Liu and Jien Li and Ziqi Zhang}, booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop}, year={2025}, url={https://openreview.net/forum?id=56xKN7KJjy} }