Instructions to use Taykhoom/ERNIE-RNA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/ERNIE-RNA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/ERNIE-RNA", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/ERNIE-RNA", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - rna | |
| library_name: transformers | |
| tags: | |
| - RNA | |
| - language-model | |
| license: apache-2.0 | |
| # ERNIE-RNA | |
| ERNIE-RNA is an RNA-specific large language model that incorporates RNA base-pairing potential as a recurrent 2D structural bias into each attention layer, enabling the model to capture secondary structure information during pretraining. | |
| ## Architecture | |
| | Parameter | Value | | |
| |---|---| | |
| | Layers | 12 | | |
| | Attention heads | 12 | | |
| | Embedding dimension | 768 | | |
| | FFN dimension | 3072 | | |
| | Vocabulary size | 25 | | |
| | Positional encoding | Sinusoidal (fairseq-style) | | |
| | Architecture | Post-LN Transformer with recurrent 2D RNA pairing bias | | |
| | Max sequence length | 1024 | | |
| ### Vocabulary | |
| | Token | ID | Notes | | |
| |---|---|---| | |
| | `<cls>` | 0 | Prepended to every sequence | | |
| | `<pad>` | 1 | Padding token | | |
| | `<eos>` | 2 | Appended to every sequence | | |
| | `<unk>` | 3 | Unknown token | | |
| | G | 4 | | | |
| | A | 5 | | | |
| | U | 6 | T is silently mapped to U during tokenization | | |
| | C | 7 | | | |
| | N | 8 | Ambiguous nucleotide | | |
| | Y-I | 9-20 | IUPAC ambiguity codes | | |
| | madeupword0-2 | 21-23 | Padding tokens from original vocab | | |
| | `<mask>` | 24 | MLM mask token | | |
| ### 2D RNA Pairing Bias | |
| ERNIE-RNA computes a pairwise RNA base-pairing potential matrix from the input sequence at the start of each forward pass. This matrix (shape `[B, T, T, 1]`) is projected to `[B, H, T, T]` via a 2-layer MLP (1 -> 6 -> H, with GELU) and added to the attention logits in the first layer. The pre-softmax attention scores then become the updated 2D bias for the next layer, creating a recurrent structural information pathway across all 12 transformer layers. | |
| Base-pairing scores: A-U = 2.0, G-C = 3.0, G-U wobble = 0.8. | |
| ## Pretraining | |
| - **Objective:** Masked language modeling (MLM) on RNA sequences | |
| - **Data:** RNAcentral (non-redundant RNA sequences) | |
| - **Source checkpoint:** `ERNIE-RNA_pretrain.pt` | |
| ### Checkpoint selection | |
| Single pretrained checkpoint from the original repository. Used as-is; no fine-tuned variants are included in this release. | |
| ## Parity Verification | |
| Hidden-state representations verified identical (max abs diff = 1.82e-06) to the original | |
| implementation at all 13 representation levels (embedding + 12 transformer layers). | |
| Verified on GPU with PyTorch 2.7 / CUDA 12. | |
| Only `attn_implementation="eager"` is supported (see Implementation Notes). | |
| ## Related Models | |
| See the full [ERNIE-RNA collection](https://huggingface.co/collections/Taykhoom/ernie-rna-6a20c1a8ea56c00a74e2dd93). | |
| | Model | Notes | | |
| |---|---| | |
| | **[Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA)** | **Pretrained model (this model)** | | |
| | [Taykhoom/ERNIE-RNA-SS](https://huggingface.co/Taykhoom/ERNIE-RNA-SS) | SS fine-tuned (bpRNA-new), backbone only | | |
| | [Taykhoom/ERNIE-RNA-MRL](https://huggingface.co/Taykhoom/ERNIE-RNA-MRL) | UTR MRL fine-tuned, backbone only | | |
| ## Usage | |
| ### Embedding generation | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained("Taykhoom/ERNIE-RNA", trust_remote_code=True) | |
| model = AutoModel.from_pretrained("Taykhoom/ERNIE-RNA", trust_remote_code=True) | |
| model.eval() | |
| sequences = ["AUGCAUGCAUGC", "GGGGCCCCGGGG"] | |
| enc = tokenizer(sequences, return_tensors="pt", padding=True) | |
| with torch.no_grad(): | |
| out = model(**enc) | |
| cls_emb = out.last_hidden_state[:, 0, :] # (batch, 768) -- CLS token | |
| token_emb = out.last_hidden_state # (batch, seq_len, 768) | |
| # Intermediate layers | |
| out_all = model(**enc, output_hidden_states=True) | |
| layer6_emb = out_all.hidden_states[6] # (batch, seq_len, 768) | |
| ``` | |
| ### MLM logits | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForMaskedLM | |
| tokenizer = AutoTokenizer.from_pretrained("Taykhoom/ERNIE-RNA", trust_remote_code=True) | |
| model = AutoModelForMaskedLM.from_pretrained("Taykhoom/ERNIE-RNA", trust_remote_code=True) | |
| model.eval() | |
| enc = tokenizer(["AUG<mask>AUG"], return_tensors="pt") | |
| with torch.no_grad(): | |
| logits = model(**enc).logits # (1, seq_len, 25) | |
| ``` | |
| ### Fine-tuning | |
| Use the CLS token embedding (`last_hidden_state[:, 0, :]`) as input to a prediction head for sequence-level tasks. For token-level tasks, use `last_hidden_state` directly. | |
| ## Implementation Notes | |
| ERNIE-RNA's recurrent 2D bias is updated from the pre-softmax attention scores at every layer (the raw QK logits become the bias input for the next layer). Fused attention kernels (SDPA, FlashAttention) do not expose pre-softmax scores, so they cannot maintain this recurrent pathway. Only `attn_implementation="eager"` is supported; requesting `sdpa` or `flash_attention_2` raises a `ValueError`. | |
| The `twod_proj` MLP is always run in float32 (matching the original) regardless of the model's compute dtype. | |
| ## Citation | |
| ```bibtex | |
| @article{yin2025_ernierna, | |
| title = {{ERNIE-RNA}: an {RNA} language model with structure-enhanced representations}, | |
| author = {Yin, Weijie and Zhang, Zhaoyu and He, Liang and Jiang, Rui and Zhang, Shuo and Liu, Gan and Zeng, Xuezhi and Zhao, Wen and Gao, Xiaowo}, | |
| journal = {Nature Communications}, | |
| volume = {16}, | |
| number = {1}, | |
| pages = {8407}, | |
| year = {2025}, | |
| doi = {10.1038/s41467-025-64972-0} | |
| } | |
| ``` | |
| ## Credits | |
| Original model and code by Yin et al. Source: [GitHub](https://github.com/Bruce-ywj/ERNIE-RNA). | |
| The HF conversion code was authored primarily by [Claude Code](https://claude.ai/code) | |
| and reviewed manually by Taykhoom Dalal. | |
| ## License | |
| Apache 2.0, following the original repository. | |