NucEL / README.md
FreakingPotato's picture
Upload README.md with huggingface_hub
fc3c6c9 verified
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - genomics
  - nucleotide
  - dna
  - sequence-modeling
  - biology
  - bioinformatics
datasets:
  - genome
pipeline_tag: feature-extraction

NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations

NucEL is a specialized language model designed for nucleotide sequence analysis and genomic applications. This model provides powerful embeddings for DNA sequences and can be fine-tuned for various downstream genomic tasks.

Model Details

  • Model Type: Transformer-based sequence model
  • Domain: Genomics and Nucleotide Sequences
  • Architecture: Based on ModernBert architecture optimized for nucleotide sequences

Features

  • Nucleotide-level tokenization and embedding
  • Pre-trained on human genome
  • Optimized for biological sequence understanding

Usage

Basic Usage

from transformers import AutoModel
from tokenizer import NucEL_Tokenizer

# Load model and tokenizer
model = AutoModel.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)
tokenizer = NucEL_Tokenizer.from_pretrained("FreakingPotato/NucEL", trust_remote_code=True)

# Example DNA sequence
sequence = "ATCGATCGATCGATCG"

# Tokenize and encode
inputs = tokenizer(sequence, return_tensors="pt")
outputs = model(**inputs)

# Get sequence embeddings
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")

Installation

pip install transformers torch
# Install any additional dependencies for your specific use case

Requirements

  • transformers >= 4.21.0
  • torch >= 1.9.0
  • Python >= 3.7

Citation

If you use NucEL in your research, please cite:

@misc{nucel2024,
  title={NucEL: Single-Nucleotide ELECTRA-Style Genomic Pre-training for Efficient and Interpretable Representations},
  author={Ke Ding, Brian Parker, and Jiayu Wen},
  year={2025},
  howpublished={\url{https://huggingface.co/FreakingPotato/NucEL}}
}

License

This model is released under the Apache 2.0 License.