--- language: - en license: apache-2.0 library_name: transformers tags: - genomics - rna - nucleotide - sequence-modeling - biology - bioinformatics - electra pipeline_tag: feature-extraction --- # RNAElectra: Single-Nucleotide ELECTRA-Style Pre-training for RNA Representation Learning RNAElectra is a nucleotide-resolution RNA language model trained using an ELECTRA-style objective for efficient and discriminative representation learning. The model produces contextualized embeddings for RNA sequences and is designed for downstream transcriptomic and regulatory modeling tasks. ## Model Details - **Model Type**: Transformer-based discriminator model - **Training Objective**: ELECTRA-style replaced-token detection - **Resolution**: Single-nucleotide - **Domain**: RNA and transcriptomic sequences - **Architecture**: ModernBERT-style backbone adapted for nucleotide sequences RNAElectra focuses on efficient pre-training by learning to discriminate corrupted tokens rather than reconstruct them, leading to strong representations with improved training efficiency. ## Key Features - Single-nucleotide tokenization - Contextual RNA sequence embeddings - ELECTRA-style discriminative pre-training - Suitable for RNA function prediction, RBP binding modeling, stability prediction, regulatory element analysis, and downstream fine-tuning tasks ## Usage ### Basic Feature Extraction ```python import torch from transformers import AutoModel from tokenizer import NucEL_Tokenizer device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModel.from_pretrained( "FreakingPotato/RNAElectra", trust_remote_code=True ).to(device) model.eval() tokenizer = NucEL_Tokenizer.from_pretrained( "FreakingPotato/RNAElectra", trust_remote_code=True ) sequence = "AUGCAUGCAUGCAUGC" inputs = tokenizer(sequence, return_tensors="pt") inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): outputs = model(**inputs) embeddings = outputs.last_hidden_state print(f"Sequence embeddings shape: {embeddings.shape}") ``` ## Installation ```bash pip install transformers torch ``` ## Requirements - transformers >= 5.0.0 - torch >= 2.10.0 - Python >= 3.12.3 GPU is recommended for large-scale inference. ## Pre-training Overview RNAElectra was trained using an ELECTRA-style generator–discriminator framework. A generator predicts corrupted tokens, and a discriminator learns to detect replaced tokens. Only the discriminator weights are released in this repository. This objective improves training efficiency compared to masked language modeling while preserving strong contextual representations. ## Intended Use RNAElectra is intended for feature extraction, downstream fine-tuning, and representation learning in RNA and transcriptomic modeling tasks. It is not intended for clinical decision-making or medical diagnostics. ## License This model is released under the Apache 2.0 License.