Protenix-RNA

Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction. The current checkpoint was selected by the EMA validation lDDT-complex best metric at training step 16,999 and is distributed as a native Protenix checkpoint for the Protenix codebase, not as a transformers.AutoModel package.

Files

File	Description
`checkpoints/best_ema_0.999.pt`	EMA checkpoint selected at step 16,999.
`config.yaml`	Resolved fine-tuning/evaluation config.
`validation_comparison.csv`	lDDT-only validation comparison against the base and previous fine-tuned checkpoints.
`eval/full_eval_base_vs_best_summary.csv`	Full validation aggregate metrics for Protenix-RNA vs base Protenix.
`eval/top100_base_vs_best_summary.csv`	Aggregate metrics on the top 100 targets selected by Protenix-RNA TM-score C1' best.
`eval/top100_base_vs_best_comparison.csv`	Per-target top-100 comparison CSV with base, Protenix-RNA, and delta columns.
`eval/selected_best_top100.csv`	The selected top-100 target rows from the Protenix-RNA full eval.
`checkpoint_info.json`	Source path, checkpoint step, and artifact metadata.
`figures/`	Validation, TM-score, pLDDT, and structure-collage plots.

The checkpoint is a torch.load(..., weights_only=False) dictionary with keys model, optimizer, scheduler, and step. The stored step is 16999.

Training Summary

Base model: protenix_base_default_v1.0.0
Fine-tuning data: local RNA fine-tune split from outputs/rna_finetune_full
Validation split size: 478 PDB IDs
Training crop size: 384 tokens
Validation max tokens: 768
RNA MSA: enabled
Protein MSA and templates: disabled
EMA decay: 0.999
Optimizer for the current continuation: Aurora
Selection metric: rna_finetune_val/ema0.999_lddt/complex/best.avg, maximize
Current selection metric value: 0.772730 at step 16,999
Full eval settings: seed 42, bf16, N_sample=5, N_step=20, N_cycle=4, max_n_token=768
Full eval size after token filtering: 1,490 target rows from 195 PDB IDs

Full RNA Evaluation

Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.

The full comparison table below was produced for the previous step-12,999 checkpoint. The current step-16,999 checkpoint has updated validation metrics in checkpoint_info.json; the long full comparison has not been rerun yet.

Metric	Base Protenix	Protenix-RNA	Delta
lDDT complex best	0.5565	0.7559	+0.1994
lDDT complex mean	0.5428	0.7434	+0.2005
lDDT complex rank1	0.5423	0.7424	+0.2001
TM-score complex best	0.8413	0.9272	+0.0859
TM-score complex rank1	0.8198	0.9140	+0.0942
TM-score C1' best	0.4611	0.6209	+0.1599
TM-score C1' rank1	0.4235	0.5916	+0.1681
pLDDT rank1	69.06	80.83	+11.77
Loss	1244.81	834.44	-410.36

These values come from a full comparison run against protenix_base_default_v1.0.0 using the same RNA validation setup and saved predictions for the step-12,999 checkpoint.

Top-100 Structure Comparison

The top-100 set is selected from the full Protenix-RNA eval by tm_score_c1prime_best, then matched against base-model predictions from the same validation set. This subset is useful for inspecting best-case RNA behavior; it is not an unbiased dataset average.

Top-100 metric	Base Protenix	Protenix-RNA	Delta
TM-score C1' best	0.8082	0.9904	+0.1822
TM-score C1' rank1	0.7768	0.9602	+0.1833
TM-score complex best	0.9332	0.9848	+0.0516
TM-score complex rank1	0.9297	0.9806	+0.0509
lDDT complex best	0.7749	0.9211	+0.1463
lDDT complex rank1	0.7717	0.9176	+0.1459
pLDDT rank1	86.88	91.85	+4.96

The following PyMOL-rendered collage shows rank-1 predicted structures from representative top-100 targets, colored by atom pLDDT stored in the mmCIF B-factor field.

Checkpoint Selection Trace

This checkpoint was selected from the EMA validation loop by lDDT-complex best at step 16,999.

Metric	Base Protenix	Prior FT s9499	Previous s12999	Current s16999	Gain vs s12999
lDDT best	0.5558	0.7395	0.7587	0.7727	+0.0141
lDDT mean	0.5420	0.7261	0.7463	0.7613	+0.0150
lDDT rank1	0.5417	0.7254	0.7467	0.7614	+0.0146

Usage

Download the checkpoint and point Protenix at it with --load_params_only true:

hf download LiteFold/protenix-rna \
  checkpoints/best_ema_0.999.pt \
  --local-dir ./protenix-rna

Example evaluation invocation inside the Protenix checkout:

LOAD_CHECKPOINT_PATH=./protenix-rna/checkpoints/best_ema_0.999.pt \
VAL_MAX_N_TOKEN=768 \
VAL_LIMIT=-1 \
N_SAMPLE=5 \
N_STEP=20 \
N_CYCLE=4 \
./run_rna_latest_full_eval_tm_dump.sh

For direct loading:

import torch

ckpt = torch.load("checkpoints/best_ema_0.999.pt", map_location="cpu", weights_only=False)
state_dict = ckpt["model"]
step = ckpt["step"]

Limitations

This is a research checkpoint specialized for the RNA fine-tuning setup above. It has not been converted into a standalone Transformers model and should be evaluated with the same Protenix code/configuration family used for training.

Downloads last month: 23

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LiteFold/protenix-rna

Evaluation results

lDDT complex best on RNA fine-tune validation split
self-reported

0.773
lDDT complex mean on RNA fine-tune validation split
self-reported

0.761
lDDT complex rank1 on RNA fine-tune validation split
self-reported

0.761
TM-score complex best on RNA fine-tune validation split
self-reported

0.937
TM-score complex rank1 on RNA fine-tune validation split
self-reported

0.926
TM-score C1' best on RNA fine-tune validation split
self-reported

0.639
TM-score C1' rank1 on RNA fine-tune validation split
self-reported

0.599