FreakingPotato
/

RNAElectra

@@ -1,11 +1,43 @@
 ---
 license: apache-2.0
 ---
-# RNAElectra
-RNAElectra is a pretrained RNA language model for nucleotide-level sequence representation learning.
-## Load model
 ```python
 import torch
@@ -18,6 +50,7 @@ model = AutoModel.from_pretrained(
     "FreakingPotato/RNAElectra",
     trust_remote_code=True
 ).to(device)
 tokenizer = NucEL_Tokenizer.from_pretrained(
     "FreakingPotato/RNAElectra",
@@ -25,6 +58,7 @@ tokenizer = NucEL_Tokenizer.from_pretrained(
 )
 sequence = "AUGCAUGCAUGCAUGC"
 inputs = tokenizer(sequence, return_tensors="pt")
 inputs = {k: v.to(device) for k, v in inputs.items()}
@@ -32,4 +66,31 @@ with torch.no_grad():
     outputs = model(**inputs)
 embeddings = outputs.last_hidden_state
-print(embeddings.shape)

 ---
+language:
+- en
 license: apache-2.0
+library_name: transformers
+tags:
+- genomics
+- rna
+- nucleotide
+- sequence-modeling
+- biology
+- bioinformatics
+- electra
+pipeline_tag: feature-extraction
 ---
+# RNAElectra: Single-Nucleotide ELECTRA-Style Pre-training for RNA Representation Learning
+RNAElectra is a nucleotide-resolution RNA language model trained using an ELECTRA-style objective for efficient and discriminative representation learning. The model produces contextualized embeddings for RNA sequences and is designed for downstream transcriptomic and regulatory modeling tasks.
+## Model Details
+- **Model Type**: Transformer-based discriminator model
+- **Training Objective**: ELECTRA-style replaced-token detection
+- **Resolution**: Single-nucleotide
+- **Domain**: RNA and transcriptomic sequences
+- **Architecture**: ModernBERT-style backbone adapted for nucleotide sequences
+RNAElectra focuses on efficient pre-training by learning to discriminate corrupted tokens rather than reconstruct them, leading to strong representations with improved training efficiency.
+## Key Features
+- Single-nucleotide tokenization
+- Contextual RNA sequence embeddings
+- ELECTRA-style discriminative pre-training
+- Suitable for RNA function prediction, RBP binding modeling, stability prediction, regulatory element analysis, and downstream fine-tuning tasks
+## Usage
+### Basic Feature Extraction
 ```python
 import torch
     "FreakingPotato/RNAElectra",
     trust_remote_code=True
 ).to(device)
+model.eval()
 tokenizer = NucEL_Tokenizer.from_pretrained(
     "FreakingPotato/RNAElectra",
 )
 sequence = "AUGCAUGCAUGCAUGC"
 inputs = tokenizer(sequence, return_tensors="pt")
 inputs = {k: v.to(device) for k, v in inputs.items()}
     outputs = model(**inputs)
 embeddings = outputs.last_hidden_state
+print(f"Sequence embeddings shape: {embeddings.shape}")
+```
+## Installation
+```bash
+pip install transformers torch
+```
+## Requirements
+- transformers >= 5.0.0
+- torch >= 2.10.0
+- Python >= 3.12.3
+GPU is recommended for large-scale inference.
+## Pre-training Overview
+RNAElectra was trained using an ELECTRA-style generator–discriminator framework. A generator predicts corrupted tokens, and a discriminator learns to detect replaced tokens. Only the discriminator weights are released in this repository. This objective improves training efficiency compared to masked language modeling while preserving strong contextual representations.
+## Intended Use
+RNAElectra is intended for feature extraction, downstream fine-tuning, and representation learning in RNA and transcriptomic modeling tasks. It is not intended for clinical decision-making or medical diagnostics.
+## License
+This model is released under the Apache 2.0 License.