How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True, dtype="auto")
Quick Links

UTR-LM-MLMSISS

UTR-LM is a 5' UTR RNA language model based on ESM2, pretrained on endogenous 5' UTRs from five species and a large synthetic library. This checkpoint (UTR-LM-MLMSISS) was trained with MLM + MFE regression + secondary structure prediction as combined supervised auxiliary objectives.

Architecture

Parameter Value
Layers 6
Attention heads 16
Embedding dimension 128
Vocabulary size 10
Positional encoding Rotary (RoPE)
Architecture ESM2-style pre-LN Transformer

Vocabulary: <pad> (0), <eos> (1), <unk> (2), A (3), G (4), C (5), T (6), <cls> (7), <mask> (8), <sep> (9)

Pretraining

  • Objective: Masked language modeling + MFE regression + per-token secondary structure prediction (3-class: unpaired, stem, loop)
  • Data: Endogenous 5' UTRs from five species (human, mouse, zebrafish, Drosophila, yeast) combined with the Cao et al. random 5' UTR synthetic library
  • Source checkpoint: ESM2SISS_FS4.1_fiveSpeciesCao_6layers_16heads_128embedsize_4096batchToks_lr1e-05_supervisedweight1.0_structureweight1.0_MLMLossMin_epoch93.pkl

Checkpoint selection

Multiple ESM2SISS checkpoints were available (FS4.1, FS4.4, FS4.7, FS4.10, FS4.13, FS4.16, FS4.19, FS4.22). The FS4.1 checkpoint at epoch 93 was selected because it is the version specified in the original UTR-LM paper for the mean ribosome load (MRL) downstream fine-tuning task (used in the MJ3_Finetune evaluation scripts with --prefix ESM2SISS_FS4.1.ep93).

Parity Verification

Hidden-state representations produced by this HF model are verified to be exactly identical (max absolute difference = 0.00) to the original ESM2-based implementation at all 7 representation levels (initial embedding + 6 transformer layers). Verified on GPU with PyTorch 2.8 / CUDA 12.6.

Related Models

See the full UTR-LM collection.

Model Pretraining Objective Notes
UTR-LM-MLM MLM Base model
UTR-LM-MLMSI MLM + MFE regression Recommended for TE / EL tasks
UTR-LM-MLMSS MLM + secondary structure —
UTR-LM-MLMSISS MLM + MFE + secondary structure This model — recommended for MRL tasks

Usage

Embedding generation

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model.eval()

sequences = ["ATGCATGCATGC", "GCTAGCTAGCTAGCTA"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)

with torch.no_grad():
    out = model(**enc)

# CLS token embedding (position 0) - recommended for sequence-level tasks
cls_emb = out.last_hidden_state[:, 0, :]   # (batch, 128)

# All-token embeddings
token_emb = out.last_hidden_state           # (batch, seq_len, 128)

# Intermediate layer representations
out_all = model(**enc, output_hidden_states=True)
layer3_emb = out_all.hidden_states[3]       # after layer 3, shape (batch, seq_len, 128)

MLM logits

import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model.eval()

enc = tokenizer(["ATGC<mask>ATGC"], return_tensors="pt")
with torch.no_grad():
    logits = model(**enc).logits   # (1, seq_len, 10)

Fine-tuning

The model follows standard HF conventions and can be fine-tuned with any Trainer-compatible setup. For sequence regression tasks, use the CLS token embedding as input to a prediction head (as done in the original UTR-LM paper).

Citation

@article{chu2024utrlm,
  title   = {A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions},
  author  = {Chu, Yanyi and Yu, Dan and Li, Yupeng and Huang, Kaixuan and Shen, Yue and Cong, Le and Zhang, Jason and Wang, Mengdi},
  journal = {Nature Machine Intelligence},
  volume  = {6},
  number  = {4},
  pages   = {449--460},
  year    = {2024},
  doi     = {10.1038/s42256-024-00823-9}
}

Implementation Notes

The original UTR-LM implementation uses standard scaled dot-product attention. This HF port adds support for attn_implementation="sdpa" (PyTorch F.scaled_dot_product_attention) and attn_implementation="flash_attention_2" (requires pip install flash-attn --no-build-isolation), which were not part of the original codebase.

Credits

Original model and code by Yanyi Chu et al. (Stanford). Source code: UTR-LM GitHub repository. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.

License

GPL-3.0, following the original UTR-LM repository.

Downloads last month
-
Safetensors
Model size
1.21M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Taykhoom/UTR-LM-MLMSISS