Instructions to use Taykhoom/ERNIE-RNA-MRL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/ERNIE-RNA-MRL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - rna | |
| library_name: transformers | |
| tags: | |
| - RNA | |
| - language-model | |
| license: apache-2.0 | |
| # ERNIE-RNA-MRL | |
| ERNIE-RNA fine-tuned on UTR mean ribosome load (MRL) prediction, backbone only. | |
| The CNN prediction head has been discarded; only the encoder weights are included. | |
| ## Architecture | |
| | Parameter | Value | | |
| |---|---| | |
| | Layers | 12 | | |
| | Attention heads | 12 | | |
| | Embedding dimension | 768 | | |
| | FFN dimension | 3072 | | |
| | Vocabulary size | 25 | | |
| | Positional encoding | Sinusoidal (fairseq-style) | | |
| | Architecture | Post-LN Transformer with recurrent 2D RNA pairing bias | | |
| | Max sequence length | 1024 | | |
| See [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA) for the vocabulary table and | |
| full architecture description. | |
| ## Pretraining + Fine-tuning | |
| - **Pretraining objective:** Masked language modeling on RNAcentral | |
| - **Fine-tuning task:** UTR mean ribosome load (MRL) prediction | |
| - **Source checkpoint:** `ERNIE-RNA-UTR_ML_CNN.pt` | |
| ### Checkpoint selection | |
| Single MRL fine-tuned checkpoint from the original repository. The original model uses | |
| a CNN head on top of the ERNIE-RNA encoder; only the encoder backbone is included here. | |
| ## Parity Verification | |
| Backbone weights are extracted directly from the fine-tuned checkpoint using the same | |
| key mapping and architecture as the verified pretrained model. The underlying architecture | |
| is identical to [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA), which was | |
| verified at max abs diff = 1.82e-06 across all 13 representation levels. | |
| Only `attn_implementation="eager"` is supported (see Implementation Notes). | |
| ## Related Models | |
| See the full [ERNIE-RNA collection](https://huggingface.co/collections/Taykhoom/ernie-rna-6a20c1a8ea56c00a74e2dd93). | |
| | Model | Notes | | |
| |---|---| | |
| | [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA) | Pretrained model | | |
| | [Taykhoom/ERNIE-RNA-SS](https://huggingface.co/Taykhoom/ERNIE-RNA-SS) | SS fine-tuned | | |
| | **[Taykhoom/ERNIE-RNA-MRL](https://huggingface.co/Taykhoom/ERNIE-RNA-MRL)** | **This model -- UTR MRL fine-tuned** | | |
| ## Usage | |
| ### Embedding generation | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained("Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True) | |
| model = AutoModel.from_pretrained("Taykhoom/ERNIE-RNA-MRL", trust_remote_code=True) | |
| model.eval() | |
| sequences = ["AUGCAUGCAUGC", "GGGGCCCCGGGG"] | |
| enc = tokenizer(sequences, return_tensors="pt", padding=True) | |
| with torch.no_grad(): | |
| out = model(**enc) | |
| cls_emb = out.last_hidden_state[:, 0, :] # (batch, 768) -- CLS token | |
| token_emb = out.last_hidden_state # (batch, seq_len, 768) | |
| # Intermediate layers | |
| out_all = model(**enc, output_hidden_states=True) | |
| layer6_emb = out_all.hidden_states[6] # (batch, seq_len, 768) | |
| ``` | |
| ### Fine-tuning | |
| Use the CLS token embedding (`last_hidden_state[:, 0, :]`) as input to a prediction head | |
| for sequence-level tasks. | |
| ## Implementation Notes | |
| ERNIE-RNA's recurrent 2D bias is updated from the pre-softmax attention scores at every | |
| layer (the raw QK logits become the bias input for the next layer). Fused attention kernels | |
| (SDPA, FlashAttention) do not expose pre-softmax scores, so they cannot maintain this | |
| recurrent pathway. Only `attn_implementation="eager"` is supported; requesting `sdpa` or | |
| `flash_attention_2` raises a `ValueError`. | |
| The `twod_proj` MLP is always run in float32 (matching the original) regardless of the | |
| model's compute dtype. | |
| ## Citation | |
| ```bibtex | |
| @article{yin2025_ernierna, | |
| title = {{ERNIE-RNA}: an {RNA} language model with structure-enhanced representations}, | |
| author = {Yin, Weijie and Zhang, Zhaoyu and He, Liang and Jiang, Rui and Zhang, Shuo and Liu, Gan and Zeng, Xuezhi and Zhao, Wen and Gao, Xiaowo}, | |
| journal = {Nature Communications}, | |
| volume = {16}, | |
| number = {1}, | |
| pages = {8407}, | |
| year = {2025}, | |
| doi = {10.1038/s41467-025-64972-0} | |
| } | |
| ``` | |
| ## Credits | |
| Original model and code by Yin et al. Source: [GitHub](https://github.com/Bruce-ywj/ERNIE-RNA). | |
| The HF conversion code was authored primarily by [Claude Code](https://claude.ai/code) | |
| and reviewed manually by Taykhoom Dalal. | |
| ## License | |
| Apache 2.0, following the original repository. | |