NucEL

File size: 2,094 Bytes

---
language: 
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- nucleotide
- dna
- sequence-modeling
- biology
- bioinformatics
datasets:
- genome
pipeline_tag: feature-extraction
---

# NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations

NucEL is a specialized language model designed for nucleotide sequence analysis and genomic applications. This model provides powerful embeddings for DNA sequences and can be fine-tuned for various downstream genomic tasks.

## Model Details

- **Model Type**: Transformer-based sequence model
- **Domain**: Genomics and Nucleotide Sequences
- **Architecture**: Based on ModernBert architecture optimized for nucleotide sequences

## Features

- Nucleotide-level tokenization and embedding
- Pre-trained on human genome
- Optimized for biological sequence understanding

## Usage

### Basic Usage

```python
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)
tokenizer = NucEL_Tokenizer.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)

# Example DNA sequence
sequence = "ATCGATCGATCGATCG"

# Tokenize and encode
inputs = tokenizer(sequence, return_tensors="pt")
outputs = model(**inputs)

# Get sequence embeddings
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
```

## Installation

```bash
pip install transformers torch
# Install any additional dependencies for your specific use case
```

## Requirements

- transformers >= 4.21.0
- torch >= 1.9.0
- Python >= 3.7

## Citation

If you use NucEL in your research, please cite:

```bibtex
@misc{nucel2024,
  title={NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations},
  author={Ke Ding, Brian Parker, and Jiayu Wen},
  year={2025},
  howpublished={\url{https://huggingface.co/FreakingPotato/NucEL}}
}
```

## License

This model is released under the Apache 2.0 License.