ERNIE-RNA-MRL / README.md
Taykhoom's picture
Upload folder using huggingface_hub
e695ec7 verified
metadata
language:
  - rna
library_name: transformers
tags:
  - RNA
  - language-model
license: apache-2.0

ERNIE-RNA-MRL

ERNIE-RNA fine-tuned on UTR mean ribosome load (MRL) prediction, backbone only. The CNN prediction head has been discarded; only the encoder weights are included.

Architecture

Parameter Value
Layers 12
Attention heads 12
Embedding dimension 768
FFN dimension 3072
Vocabulary size 25
Positional encoding Sinusoidal (fairseq-style)
Architecture Post-LN Transformer with recurrent 2D RNA pairing bias
Max sequence length 1024

See Taykhoom/ERNIE-RNA for the vocabulary table and full architecture description.

Pretraining + Fine-tuning

  • Pretraining objective: Masked language modeling on RNAcentral
  • Fine-tuning task: UTR mean ribosome load (MRL) prediction
  • Source checkpoint: ERNIE-RNA-UTR_ML_CNN.pt

Checkpoint selection

Single MRL fine-tuned checkpoint from the original repository. The original model uses a CNN head on top of the ERNIE-RNA encoder; only the encoder backbone is included here.

Parity Verification

Backbone weights are extracted directly from the fine-tuned checkpoint using the same key mapping and architecture as the verified pretrained model. The underlying architecture is identical to Taykhoom/ERNIE-RNA, which was verified at max abs diff = 1.82e-06 across all 13 representation levels.

Only attn_implementation="eager" is supported (see Implementation Notes).

Related Models

See the full ERNIE-RNA collection.

Model Notes
Taykhoom/ERNIE-RNA Pretrained model
Taykhoom/ERNIE-RNA-SS SS fine-tuned
Taykhoom/ERNIE-RNA-MRL This model -- UTR MRL fine-tuned

Usage

Embedding generation

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True)
model.eval()

sequences = ["AUGCAUGCAUGC", "GGGGCCCCGGGG"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)

with torch.no_grad():
    out = model(**enc)

cls_emb   = out.last_hidden_state[:, 0, :]   # (batch, 768) -- CLS token
token_emb = out.last_hidden_state             # (batch, seq_len, 768)

# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer6_emb = out_all.hidden_states[6]         # (batch, seq_len, 768)

Fine-tuning

Use the CLS token embedding (last_hidden_state[:, 0, :]) as input to a prediction head for sequence-level tasks.

Implementation Notes

ERNIE-RNA's recurrent 2D bias is updated from the pre-softmax attention scores at every layer (the raw QK logits become the bias input for the next layer). Fused attention kernels (SDPA, FlashAttention) do not expose pre-softmax scores, so they cannot maintain this recurrent pathway. Only attn_implementation="eager" is supported; requesting sdpa or flash_attention_2 raises a ValueError.

The twod_proj MLP is always run in float32 (matching the original) regardless of the model's compute dtype.

Citation

@article{yin2025_ernierna,
  title   = {{ERNIE-RNA}: an {RNA} language model with structure-enhanced representations},
  author  = {Yin, Weijie and Zhang, Zhaoyu and He, Liang and Jiang, Rui and Zhang, Shuo and Liu, Gan and Zeng, Xuezhi and Zhao, Wen and Gao, Xiaowo},
  journal = {Nature Communications},
  volume  = {16},
  number  = {1},
  pages   = {8407},
  year    = {2025},
  doi     = {10.1038/s41467-025-64972-0}
}

Credits

Original model and code by Yin et al. Source: GitHub. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.

License

Apache 2.0, following the original repository.