Instructions to use Taykhoom/RiNALMo-micro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/RiNALMo-micro with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/RiNALMo-micro", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
RiNALMo-micro
Minimal HuggingFace port of the micro (35M parameter) variant of RiNALMo — a general-purpose RNA language model pre-trained on 36 million non-coding RNA sequences.
Architecture
| Parameter | Value |
|---|---|
| Layers | 12 |
| Attention heads | 20 |
| Embedding dimension | 480 |
| FFN hidden dimension | 1280 (SwiGLU, 2/3 x 4 x embed) |
| Vocabulary size | 22 |
| Positional encoding | RoPE (base=10000, non-interleaved) |
| Architecture | Pre-LN Transformer with SwiGLU FFN |
| Max sequence length | ~8192 (practical; RoPE has no hard limit) |
Vocabulary (index order): <cls> (0), <pad> (1), <eos> (2), <unk> (3),
<mask> (4), A (5), C (6), G (7), T (8), I (9), R (10), Y (11), K (12), M (13),
S (14), W (15), B (16), D (17), H (18), V (19), N (20), - (21).
Note: the tokenizer converts U -> T before encoding (the model was trained on T).
Pretraining
- Objective: Masked language modeling (BERT-style, 15% mask rate)
- Data: 36 million non-coding RNA sequences from multiple databases
- Source checkpoint:
rinalmo_micro_pretrained.ptfrom Zenodo 15043668
Checkpoint selection
The micro variant is the smallest (35M params) and fastest to use. Choose mega or giga for stronger representations on challenging tasks.
Parity Verification
All 13 representation levels (embedding + 12 transformer layers) verified to be bit-exact (max abs diff = 0.00) against a pure-PyTorch reference that loads the original weights. Weight mapping verified for all 156 per-block tensors. Eager and SDPA implementations agree within 4e-6 on padded batches.
Related Models
See the full RiNALMo collection.
| Model | Parameters | Notes |
|---|---|---|
| RiNALMo-micro | 35M | This model |
| RiNALMo-mega | 150M | Medium variant |
| RiNALMo-giga | 650M | Full model |
Usage
Embedding generation
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model.eval()
sequences = ["ACUUUGGCCA", "CCCGGU"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)
with torch.no_grad():
out = model(**enc)
cls_emb = out.last_hidden_state[:, 0, :] # (batch, 480) -- CLS token
token_emb = out.last_hidden_state # (batch, seq_len, 480)
# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer6_emb = out_all.hidden_states[6] # after block 6
MLM logits
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True)
model.eval()
enc = tokenizer(["ACU<mask>UGGCCA"], return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits # (1, seq_len, 22)
Faster attention backends
# SDPA (PyTorch 2.0+)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True,
attn_implementation="sdpa")
# Flash Attention 2 (requires flash-attn package)
model = AutoModel.from_pretrained("Taykhoom/RiNALMo-micro", trust_remote_code=True,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16)
Fine-tuning
Standard HF conventions. For sequence-level tasks, pool over non-padding positions or use the CLS token embedding as input to a prediction head.
Implementation Notes
The original RiNALMo uses flash_attn for attention during training. This HF port
implements eager (standard PyTorch), SDPA, and flash_attention_2 as separate backends
selectable via attn_implementation. The SDPA and flash_attention_2 backends were not
part of the original codebase.
The model uses a non-standard Pre-LN residual: the attention residual connection is
taken from the normalized input (i.e., x = attn_ln(x); x = x + attn(x)) rather
than the original. The FFN uses standard Pre-LN.
TokenDropout rescales embeddings by (1 - mask_ratio_train) / (1 - mask_ratio_observed)
even at inference, consistent with the original training code.
Citation
@article{penic2025_rinalmo,
title={RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks},
author={Penić, Rafael Josip and Vlašić, Tin and Huber, Roland G. and Wan, Yue and Šikić, Mile},
journal={Nature Communications},
volume={16},
pages={5671},
year={2025},
doi={10.1038/s41467-025-60872-5}
}
Credits
Original model and code by Penic et al. Source: GitHub lbcb-sci/RiNALMo. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.
License
Apache 2.0 (code) / CC BY 4.0 (model weights), following the original repository.
- Downloads last month
- -