File size: 2,094 Bytes
b7905a0 6583a3a b7905a0 6583a3a b7905a0 fc3c6c9 b7905a0 5249136 fc3c6c9 b7905a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- nucleotide
- dna
- sequence-modeling
- biology
- bioinformatics
datasets:
- genome
pipeline_tag: feature-extraction
---
# NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations
NucEL is a specialized language model designed for nucleotide sequence analysis and genomic applications. This model provides powerful embeddings for DNA sequences and can be fine-tuned for various downstream genomic tasks.
## Model Details
- **Model Type**: Transformer-based sequence model
- **Domain**: Genomics and Nucleotide Sequences
- **Architecture**: Based on ModernBert architecture optimized for nucleotide sequences
## Features
- Nucleotide-level tokenization and embedding
- Pre-trained on human genome
- Optimized for biological sequence understanding
## Usage
### Basic Usage
```python
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer
# Load model and tokenizer
model = AutoModel.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)
tokenizer = NucEL_Tokenizer.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)
# Example DNA sequence
sequence = "ATCGATCGATCGATCG"
# Tokenize and encode
inputs = tokenizer(sequence, return_tensors="pt")
outputs = model(**inputs)
# Get sequence embeddings
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
```
## Installation
```bash
pip install transformers torch
# Install any additional dependencies for your specific use case
```
## Requirements
- transformers >= 4.21.0
- torch >= 1.9.0
- Python >= 3.7
## Citation
If you use NucEL in your research, please cite:
```bibtex
@misc{nucel2024,
title={NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations},
author={Ke Ding, Brian Parker, and Jiayu Wen},
year={2025},
howpublished={\url{https://huggingface.co/FreakingPotato/NucEL}}
}
```
## License
This model is released under the Apache 2.0 License.
|