RiNALMo-micro

Minimal HuggingFace port of the micro (35M parameter) variant of RiNALMo — a general-purpose RNA language model pre-trained on 36 million non-coding RNA sequences.

Architecture

Parameter Value
Layers 12
Attention heads 20
Embedding dimension 480
FFN hidden dimension 1280 (SwiGLU, 2/3 x 4 x embed)
Vocabulary size 22
Positional encoding RoPE (base=10000, non-interleaved)
Architecture Pre-LN Transformer with SwiGLU FFN
Max sequence length ~8192 (practical; RoPE has no hard limit)

Vocabulary (index order): <cls> (0), <pad> (1), <eos> (2), <unk> (3), <mask> (4), A (5), C (6), G (7), T (8), I (9), R (10), Y (11), K (12), M (13), S (14), W (15), B (16), D (17), H (18), V (19), N (20), - (21).

Note: the tokenizer converts U -> T before encoding (the model was trained on T).

Pretraining

  • Objective: Masked language modeling (BERT-style, 15% mask rate)
  • Data: 36 million non-coding RNA sequences from multiple databases
  • Source checkpoint: rinalmo_micro_pretrained.pt from Zenodo 15043668

Checkpoint selection

The micro variant is the smallest (35M params) and fastest to use. Choose mega or giga for stronger representations on challenging tasks.

Parity Verification

All 13 representation levels (embedding + 12 transformer layers) verified to be bit-exact (max abs diff = 0.00) against a pure-PyTorch reference that loads the original weights. Weight mapping verified for all 156 per-block tensors. Eager and SDPA implementations agree within 4e-6 on padded batches.

Related Models

See the full RiNALMo collection.

Model Parameters Notes
RiNALMo-micro 35M This model
RiNALMo-mega 150M Medium variant
RiNALMo-giga 650M Full model

Usage

Embedding generation

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model.eval()

sequences = ["ACUUUGGCCA", "CCCGGU"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)

with torch.no_grad():
    out = model(**enc)

cls_emb   = out.last_hidden_state[:, 0, :]   # (batch, 480) -- CLS token
token_emb = out.last_hidden_state             # (batch, seq_len, 480)

# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer6_emb = out_all.hidden_states[6]         # after block 6

MLM logits

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model.eval()

enc = tokenizer(["ACU<mask>UGGCCA"], return_tensors="pt")
with torch.no_grad():
    logits = model(**enc).logits   # (1, seq_len, 22)

Faster attention backends

# SDPA (PyTorch 2.0+)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True,
                                   attn_implementation="sdpa")

# Flash Attention 2 (requires flash-attn package)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True,
                                   attn_implementation="flash_attention_2",
                                   torch_dtype=torch.bfloat16)

Fine-tuning

Standard HF conventions. For sequence-level tasks, pool over non-padding positions or use the CLS token embedding as input to a prediction head.

Implementation Notes

The original RiNALMo uses flash_attn for attention during training. This HF port implements eager (standard PyTorch), SDPA, and flash_attention_2 as separate backends selectable via attn_implementation. The SDPA and flash_attention_2 backends were not part of the original codebase.

The model uses a non-standard Pre-LN residual: the attention residual connection is taken from the normalized input (i.e., x = attn_ln(x); x = x + attn(x)) rather than the original. The FFN uses standard Pre-LN.

TokenDropout rescales embeddings by (1 - mask_ratio_train) / (1 - mask_ratio_observed) even at inference, consistent with the original training code.

Citation

@article{penic2025_rinalmo,
  title={RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks},
  author={Penić, Rafael Josip and Vlašić, Tin and Huber, Roland G. and Wan, Yue and Šikić, Mile},
  journal={Nature Communications},
  volume={16},
  pages={5671},
  year={2025},
  doi={10.1038/s41467-025-60872-5}
}

Credits

Original model and code by Penic et al. Source: GitHub lbcb-sci/RiNALMo. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.

License

Apache 2.0 (code) / CC BY 4.0 (model weights), following the original repository.

Downloads last month
-
Safetensors
Model size
33.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Taykhoom/RiNALMo-micro

Paper for Taykhoom/RiNALMo-micro