---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- rna
- nucleotide
- sequence-modeling
- biology
- bioinformatics
- electra
pipeline_tag: feature-extraction
---

# RNAElectra: Single-Nucleotide ELECTRA-Style Pre-training for RNA Representation Learning

RNAElectra is a nucleotide-resolution RNA language model trained using an ELECTRA-style objective for efficient and discriminative representation learning. The model produces contextualized embeddings for RNA sequences and is designed for downstream transcriptomic and regulatory modeling tasks.

## Model Details

- **Model Type**: Transformer-based discriminator model  
- **Training Objective**: ELECTRA-style replaced-token detection  
- **Resolution**: Single-nucleotide  
- **Domain**: RNA and transcriptomic sequences  
- **Architecture**: ModernBERT-style backbone adapted for nucleotide sequences  

RNAElectra focuses on efficient pre-training by learning to discriminate corrupted tokens rather than reconstruct them, leading to strong representations with improved training efficiency.

## Key Features

- Single-nucleotide tokenization  
- Contextual RNA sequence embeddings  
- ELECTRA-style discriminative pre-training  
- Suitable for RNA function prediction, RBP binding modeling, stability prediction, regulatory element analysis, and downstream fine-tuning tasks  

## Usage

### Basic Feature Extraction

```python
import torch
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModel.from_pretrained(
    "FreakingPotato/RNAElectra",
    trust_remote_code=True
).to(device)
model.eval()

tokenizer = NucEL_Tokenizer.from_pretrained(
    "FreakingPotato/RNAElectra",
    trust_remote_code=True
)

sequence = "AUGCAUGCAUGCAUGC"

inputs = tokenizer(sequence, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
```

## Installation

```bash
pip install transformers torch
```

## Requirements

- transformers >= 5.0.0
- torch >= 2.10.0
- Python >= 3.12.3

GPU is recommended for large-scale inference.

## Pre-training Overview

RNAElectra was trained using an ELECTRA-style generator–discriminator framework. A generator predicts corrupted tokens, and a discriminator learns to detect replaced tokens. Only the discriminator weights are released in this repository. This objective improves training efficiency compared to masked language modeling while preserving strong contextual representations.

## Intended Use

RNAElectra is intended for feature extraction, downstream fine-tuning, and representation learning in RNA and transcriptomic modeling tasks. It is not intended for clinical decision-making or medical diagnostics.

## License

This model is released under the Apache 2.0 License.