Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,31 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
# eccDNAMamba
|
| 8 |
+
**A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis**
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
### Model Overview
|
| 13 |
+
**eccDNAMamba** is a **bidirectional state-space model (SSM)** designed for efficient and topology-aware modeling of **extrachromosomal circular DNA (eccDNA)**.
|
| 14 |
+
By combining **forward and reverse Mamba-2 encoders**, **motif-level Byte Pair Encoding (BPE)**, and a lightweight **head–tail circular augmentation**, it captures wrap-around dependencies in ultra-long (10–200 kbp) genomic sequences while maintaining linear-time scalability.
|
| 15 |
+
The model provides strong performance across cancer-associated eccDNA prediction, copy-number level estimation, and real vs. pseudo-eccDNA discrimination tasks.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
### Quick Start
|
| 20 |
+
```python
|
| 21 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 22 |
+
|
| 23 |
+
tokenizer = AutoTokenizer.from_pretrained("eccdna/eccDNAMamba-1M")
|
| 24 |
+
model = AutoModelForMaskedLM.from_pretrained("eccdna/eccDNAMamba-1M")
|
| 25 |
+
|
| 26 |
+
sequence = "ATGCGTACGTTAGCGTACGT"
|
| 27 |
+
inputs = tokenizer(sequence, return_tensors="pt")
|
| 28 |
+
outputs = model(**inputs)
|
| 29 |
+
|
| 30 |
+
# Access logits or reconstruct masked spans
|
| 31 |
+
logits = outputs.logits
|