language:
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- rna
- nucleotide
- sequence-modeling
- biology
- bioinformatics
- electra
pipeline_tag: feature-extraction
RNAElectra: Single-Nucleotide ELECTRA-Style Pre-training for RNA Representation Learning
RNAElectra is a nucleotide-resolution RNA language model trained using an ELECTRA-style objective for efficient and discriminative representation learning. The model produces contextualized embeddings for RNA sequences and is designed for downstream transcriptomic and regulatory modeling tasks.
Model Details
- Model Type: Transformer-based discriminator model
- Training Objective: ELECTRA-style replaced-token detection
- Resolution: Single-nucleotide
- Domain: RNA and transcriptomic sequences
- Architecture: ModernBERT-style backbone adapted for nucleotide sequences
RNAElectra focuses on efficient pre-training by learning to discriminate corrupted tokens rather than reconstruct them, leading to strong representations with improved training efficiency.
Key Features
- Single-nucleotide tokenization
- Contextual RNA sequence embeddings
- ELECTRA-style discriminative pre-training
- Suitable for RNA function prediction, RBP binding modeling, stability prediction, regulatory element analysis, and downstream fine-tuning tasks
Usage
Basic Feature Extraction
import torch
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModel.from_pretrained(
"FreakingPotato/RNAElectra",
trust_remote_code=True
).to(device)
model.eval()
tokenizer = NucEL_Tokenizer.from_pretrained(
"FreakingPotato/RNAElectra",
trust_remote_code=True
)
sequence = "AUGCAUGCAUGCAUGC"
inputs = tokenizer(sequence, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
Installation
pip install transformers torch
Requirements
- transformers >= 5.0.0
- torch >= 2.10.0
- Python >= 3.12.3
GPU is recommended for large-scale inference.
Pre-training Overview
RNAElectra was trained using an ELECTRA-style generator–discriminator framework. A generator predicts corrupted tokens, and a discriminator learns to detect replaced tokens. Only the discriminator weights are released in this repository. This objective improves training efficiency compared to masked language modeling while preserving strong contextual representations.
Intended Use
RNAElectra is intended for feature extraction, downstream fine-tuning, and representation learning in RNA and transcriptomic modeling tasks. It is not intended for clinical decision-making or medical diagnostics.
License
This model is released under the Apache 2.0 License.