--- language: - en license: apache-2.0 library_name: transformers tags: - genomics - nucleotide - dna - sequence-modeling - biology - bioinformatics datasets: - genome pipeline_tag: feature-extraction --- # NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations NucEL is a specialized language model designed for nucleotide sequence analysis and genomic applications. This model provides powerful embeddings for DNA sequences and can be fine-tuned for various downstream genomic tasks. ## Model Details - **Model Type**: Transformer-based sequence model - **Domain**: Genomics and Nucleotide Sequences - **Architecture**: Based on ModernBert architecture optimized for nucleotide sequences ## Features - Nucleotide-level tokenization and embedding - Pre-trained on human genome - Optimized for biological sequence understanding ## Usage ### Basic Usage ```python from transformers import AutoModel from tokenizer import NucEL_Tokenizer # Load model and tokenizer model = AutoModel.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True) tokenizer = NucEL_Tokenizer.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True) # Example DNA sequence sequence = "ATCGATCGATCGATCG" # Tokenize and encode inputs = tokenizer(sequence, return_tensors="pt") outputs = model(**inputs) # Get sequence embeddings embeddings = outputs.last_hidden_state print(f"Sequence embeddings shape: {embeddings.shape}") ``` ## Installation ```bash pip install transformers torch # Install any additional dependencies for your specific use case ``` ## Requirements - transformers >= 4.21.0 - torch >= 1.9.0 - Python >= 3.7 ## Citation If you use NucEL in your research, please cite: ```bibtex @misc{nucel2024, title={NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations}, author={Ke Ding, Brian Parker, and Jiayu Wen}, year={2025}, howpublished={\url{https://huggingface.co/FreakingPotato/NucEL}} } ``` ## License This model is released under the Apache 2.0 License.