Instructions to use Taykhoom/UTR-LM-MLMSISS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/UTR-LM-MLMSISS with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
UTR-LM-MLMSISS
UTR-LM is a 5' UTR RNA language model based on ESM2, pretrained on endogenous 5' UTRs from five species and a large synthetic library. This checkpoint (UTR-LM-MLMSISS) was trained with MLM + MFE regression + secondary structure prediction as combined supervised auxiliary objectives.
Architecture
| Parameter | Value |
|---|---|
| Layers | 6 |
| Attention heads | 16 |
| Embedding dimension | 128 |
| Vocabulary size | 10 |
| Positional encoding | Rotary (RoPE) |
| Architecture | ESM2-style pre-LN Transformer |
Vocabulary: <pad> (0), <eos> (1), <unk> (2), A (3), G (4), C (5), T (6), <cls> (7), <mask> (8), <sep> (9)
Pretraining
- Objective: Masked language modeling + MFE regression + per-token secondary structure prediction (3-class: unpaired, stem, loop)
- Data: Endogenous 5' UTRs from five species (human, mouse, zebrafish, Drosophila, yeast) combined with the Cao et al. random 5' UTR synthetic library
- Source checkpoint:
ESM2SISS_FS4.1_fiveSpeciesCao_6layers_16heads_128embedsize_4096batchToks_lr1e-05_supervisedweight1.0_structureweight1.0_MLMLossMin_epoch93.pkl
Checkpoint selection
Multiple ESM2SISS checkpoints were available (FS4.1, FS4.4, FS4.7, FS4.10, FS4.13, FS4.16, FS4.19, FS4.22). The FS4.1 checkpoint at epoch 93 was selected because it is the version specified in the original UTR-LM paper for the mean ribosome load (MRL) downstream fine-tuning task (used in the MJ3_Finetune evaluation scripts with --prefix ESM2SISS_FS4.1.ep93).
Parity Verification
Hidden-state representations produced by this HF model are verified to be exactly identical (max absolute difference = 0.00) to the original ESM2-based implementation at all 7 representation levels (initial embedding + 6 transformer layers). Verified on GPU with PyTorch 2.8 / CUDA 12.6.
Related Models
See the full UTR-LM collection.
| Model | Pretraining Objective | Notes |
|---|---|---|
| UTR-LM-MLM | MLM | Base model |
| UTR-LM-MLMSI | MLM + MFE regression | Recommended for TE / EL tasks |
| UTR-LM-MLMSS | MLM + secondary structure | — |
| UTR-LM-MLMSISS | MLM + MFE + secondary structure | This model — recommended for MRL tasks |
Usage
Embedding generation
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model.eval()
sequences = ["ATGCATGCATGC", "GCTAGCTAGCTAGCTA"]
enc = tokenizer(sequences, return_tensors="pt", padding=True)
with torch.no_grad():
out = model(**enc)
# CLS token embedding (position 0) - recommended for sequence-level tasks
cls_emb = out.last_hidden_state[:, 0, :] # (batch, 128)
# All-token embeddings
token_emb = out.last_hidden_state # (batch, seq_len, 128)
# Intermediate layer representations
out_all = model(**enc, output_hidden_states=True)
layer3_emb = out_all.hidden_states[3] # after layer 3, shape (batch, seq_len, 128)
MLM logits
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True)
model.eval()
enc = tokenizer(["ATGC<mask>ATGC"], return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits # (1, seq_len, 10)
Fine-tuning
The model follows standard HF conventions and can be fine-tuned with any Trainer-compatible setup. For sequence regression tasks, use the CLS token embedding as input to a prediction head (as done in the original UTR-LM paper).
Citation
@article{chu2024utrlm,
title = {A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions},
author = {Chu, Yanyi and Yu, Dan and Li, Yupeng and Huang, Kaixuan and Shen, Yue and Cong, Le and Zhang, Jason and Wang, Mengdi},
journal = {Nature Machine Intelligence},
volume = {6},
number = {4},
pages = {449--460},
year = {2024},
doi = {10.1038/s42256-024-00823-9}
}
Implementation Notes
The original UTR-LM implementation uses standard scaled dot-product attention. This HF port adds support for attn_implementation="sdpa" (PyTorch F.scaled_dot_product_attention) and attn_implementation="flash_attention_2" (requires pip install flash-attn --no-build-isolation), which were not part of the original codebase.
Credits
Original model and code by Yanyi Chu et al. (Stanford). Source code: UTR-LM GitHub repository. The HF conversion code was authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.
License
GPL-3.0, following the original UTR-LM repository.
- Downloads last month
- -
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Taykhoom/UTR-LM-MLMSISS", trust_remote_code=True, dtype="auto")