Instructions to use Taykhoom/ERNIE-RNA-SS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/ERNIE-RNA-SS with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/ERNIE-RNA-SS", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/ERNIE-RNA-SS", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 5,079 Bytes
9910b2d a96a4f9 9910b2d 5ee6167 a96a4f9 7128930 9910b2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
language:
- rna
library_name: transformers
tags:
- RNA
- language-model
license: apache-2.0
---
# ERNIE-RNA-SS
ERNIE-RNA fine-tuned on RNA secondary structure (SS) prediction, backbone only.
The prediction head has been discarded; only the encoder weights are included.
## Architecture
| Parameter | Value |
|---|---|
| Layers | 12 |
| Attention heads | 12 |
| Embedding dimension | 768 |
| FFN dimension | 3072 |
| Vocabulary size | 25 |
| Positional encoding | Sinusoidal (fairseq-style) |
| Architecture | Post-LN Transformer with recurrent 2D RNA pairing bias |
| Max sequence length | 1024 |
See [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA) for the vocabulary table and
full architecture description.
## Pretraining + Fine-tuning
- **Pretraining objective:** Masked language modeling on RNAcentral
- **Fine-tuning task:** RNA secondary structure prediction (base-pair prediction)
- **Fine-tuning data:** RNA3DB
- **Source checkpoint:** `RNA3DB.pt`
### Checkpoint selection
Six SS fine-tuned checkpoints were available (bpRNA-1m, bpRNA-new, RIVAS, RNA3DB,
RNAStralign, bpRNA-1m_RNAStralign). All six were evaluated via linear probing on three
tasks from [mRNABench](https://huggingface.co/collections/morrislab/mrnabench): RNA
subcellular localization (`rna-loc-fazal`), mRNA half-life prediction (`rnahl-human`),
and variant effect prediction (`vep-traitgym-mendelian`).
`bpRNA-new` was excluded: its backbone is identical to the pretrained ERNIE-RNA checkpoint
(the backbone was frozen during SS fine-tuning), making it equivalent to `Taykhoom/ERNIE-RNA`.
The remaining five were evaluated via linear probing on three tasks from
[mRNABench](https://huggingface.co/collections/morrislab/mrnabench): RNA subcellular
localization, mRNA half-life prediction, and variant effect prediction. `RNA3DB` was the
best-performing checkpoint across all three tasks and was selected for this release.
## Parity Verification
Backbone weights are extracted directly from the fine-tuned checkpoint using the same
key mapping and architecture as the verified pretrained model. The underlying architecture
is identical to [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA), which was
verified at max abs diff = 1.82e-06 across all 13 representation levels.
Only `attn_implementation="eager"` is supported (see Implementation Notes).
## Related Models
See the full [ERNIE-RNA collection](https://huggingface.co/collections/Taykhoom/ernie-rna-6a20c1a8ea56c00a74e2dd93).
| Model | Notes |
|---|---|
| [Taykhoom/ERNIE-RNA](https://huggingface.co/Taykhoom/ERNIE-RNA) | Pretrained model |
| **[Taykhoom/ERNIE-RNA-SS](https://huggingface.co/Taykhoom/ERNIE-RNA-SS)** | **This model — SS fine-tuned** |
| [Taykhoom/ERNIE-RNA-MRL](https://huggingface.co/Taykhoom/ERNIE-RNA-MRL) | UTR MRL fine-tuned |
## Usage
### Embedding generation
```python
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/ERNIE-RNA-SS", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/ERNIE-RNA-SS", trust_remote_code=True)
model.eval()
sequences = ["AUGCAUGCAUGC", "GGGGCCCCGGGG"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)
with torch.no_grad():
out = model(**enc)
cls_emb = out.last_hidden_state[:, 0, :] # (batch, 768) -- CLS token
token_emb = out.last_hidden_state # (batch, seq_len, 768)
# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer6_emb = out_all.hidden_states[6] # (batch, seq_len, 768)
```
### Fine-tuning
Use the CLS token embedding (`last_hidden_state[:, 0, :]`) as input to a prediction head
for sequence-level tasks. For token-level tasks (e.g. base-pair prediction), use
`last_hidden_state` directly.
## Implementation Notes
ERNIE-RNA's recurrent 2D bias is updated from the pre-softmax attention scores at every
layer (the raw QK logits become the bias input for the next layer). Fused attention kernels
(SDPA, FlashAttention) do not expose pre-softmax scores, so they cannot maintain this
recurrent pathway. Only `attn_implementation="eager"` is supported; requesting `sdpa` or
`flash_attention_2` raises a `ValueError`.
The `twod_proj` MLP is always run in float32 (matching the original) regardless of the
model's compute dtype.
## Citation
```bibtex
@article{yin2025_ernierna,
title = {{ERNIE-RNA}: an {RNA} language model with structure-enhanced representations},
author = {Yin, Weijie and Zhang, Zhaoyu and He, Liang and Jiang, Rui and Zhang, Shuo and Liu, Gan and Zeng, Xuezhi and Zhao, Wen and Gao, Xiaowo},
journal = {Nature Communications},
volume = {16},
number = {1},
pages = {8407},
year = {2025},
doi = {10.1038/s41467-025-64972-0}
}
```
## Credits
Original model and code by Yin et al. Source: [GitHub](https://github.com/Bruce-ywj/ERNIE-RNA).
The HF conversion code was authored primarily by [Claude Code](https://claude.ai/code)
and reviewed manually by Taykhoom Dalal.
## License
Apache 2.0, following the original repository.
|