RNAElectra / README.md
FreakingPotato's picture
Updated README
1735c76
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- genomics
- rna
- nucleotide
- sequence-modeling
- biology
- bioinformatics
- electra
pipeline_tag: feature-extraction
---
# RNAElectra: Single-Nucleotide ELECTRA-Style Pre-training for RNA Representation Learning
RNAElectra is a nucleotide-resolution RNA language model trained using an ELECTRA-style objective for efficient and discriminative representation learning. The model produces contextualized embeddings for RNA sequences and is designed for downstream transcriptomic and regulatory modeling tasks.
## Model Details
- **Model Type**: Transformer-based discriminator model
- **Training Objective**: ELECTRA-style replaced-token detection
- **Resolution**: Single-nucleotide
- **Domain**: RNA and transcriptomic sequences
- **Architecture**: ModernBERT-style backbone adapted for nucleotide sequences
RNAElectra focuses on efficient pre-training by learning to discriminate corrupted tokens rather than reconstruct them, leading to strong representations with improved training efficiency.
## Key Features
- Single-nucleotide tokenization
- Contextual RNA sequence embeddings
- ELECTRA-style discriminative pre-training
- Suitable for RNA function prediction, RBP binding modeling, stability prediction, regulatory element analysis, and downstream fine-tuning tasks
## Usage
### Basic Feature Extraction
```python
import torch
from transformers import AutoModel
from tokenizer import NucEL_Tokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModel.from_pretrained(
"FreakingPotato/RNAElectra",
trust_remote_code=True
).to(device)
model.eval()
tokenizer = NucEL_Tokenizer.from_pretrained(
"FreakingPotato/RNAElectra",
trust_remote_code=True
)
sequence = "AUGCAUGCAUGCAUGC"
inputs = tokenizer(sequence, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
print(f"Sequence embeddings shape: {embeddings.shape}")
```
## Installation
```bash
pip install transformers torch
```
## Requirements
- transformers >= 5.0.0
- torch >= 2.10.0
- Python >= 3.12.3
GPU is recommended for large-scale inference.
## Pre-training Overview
RNAElectra was trained using an ELECTRA-style generator–discriminator framework. A generator predicts corrupted tokens, and a discriminator learns to detect replaced tokens. Only the discriminator weights are released in this repository. This objective improves training efficiency compared to masked language modeling while preserving strong contextual representations.
## Intended Use
RNAElectra is intended for feature extraction, downstream fine-tuning, and representation learning in RNA and transcriptomic modeling tasks. It is not intended for clinical decision-making or medical diagnostics.
## License
This model is released under the Apache 2.0 License.