File size: 2,094 Bytes
b7905a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6583a3a
b7905a0
 
 
 
6583a3a
b7905a0
 
 
 
 
 
 
fc3c6c9
 
b7905a0
 
5249136
fc3c6c9
b7905a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language: 
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- nucleotide
- dna
- sequence-modeling
- biology
- bioinformatics
datasets:
- genome
pipeline_tag: feature-extraction
---

# NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations

NucEL is a specialized language model designed for nucleotide sequence analysis and genomic applications. This model provides powerful embeddings for DNA sequences and can be fine-tuned for various downstream genomic tasks.

## Model Details

- **Model Type**: Transformer-based sequence model
- **Domain**: Genomics and Nucleotide Sequences
- **Architecture**: Based on ModernBert architecture optimized for nucleotide sequences

## Features

- Nucleotide-level tokenization and embedding
- Pre-trained on human genome
- Optimized for biological sequence understanding

## Usage

### Basic Usage

```python
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)
tokenizer = NucEL_Tokenizer.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)

# Example DNA sequence
sequence = "ATCGATCGATCGATCG"

# Tokenize and encode
inputs = tokenizer(sequence, return_tensors="pt")
outputs = model(**inputs)

# Get sequence embeddings
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
```

## Installation

```bash
pip install transformers torch
# Install any additional dependencies for your specific use case
```

## Requirements

- transformers >= 4.21.0
- torch >= 1.9.0
- Python >= 3.7

## Citation

If you use NucEL in your research, please cite:

```bibtex
@misc{nucel2024,
  title={NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations},
  author={Ke Ding, Brian Parker, and Jiayu Wen},
  year={2025},
  howpublished={\url{https://huggingface.co/FreakingPotato/NucEL}}
}
```

## License

This model is released under the Apache 2.0 License.