Protenix-RNA

Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction. The current checkpoint was selected by the EMA validation lDDT-complex best metric at training step 16,999 and is distributed as a native Protenix checkpoint for the Protenix codebase, not as a transformers.AutoModel package.

Files

File Description
checkpoints/best_ema_0.999.pt EMA checkpoint selected at step 16,999.
config.yaml Resolved fine-tuning/evaluation config.
validation_comparison.csv lDDT-only validation comparison against the base and previous fine-tuned checkpoints.
eval/full_eval_base_vs_best_summary.csv Full validation aggregate metrics for Protenix-RNA vs base Protenix.
eval/top100_base_vs_best_summary.csv Aggregate metrics on the top 100 targets selected by Protenix-RNA TM-score C1' best.
eval/top100_base_vs_best_comparison.csv Per-target top-100 comparison CSV with base, Protenix-RNA, and delta columns.
eval/selected_best_top100.csv The selected top-100 target rows from the Protenix-RNA full eval.
checkpoint_info.json Source path, checkpoint step, and artifact metadata.
figures/ Validation, TM-score, pLDDT, and structure-collage plots.

The checkpoint is a torch.load(..., weights_only=False) dictionary with keys model, optimizer, scheduler, and step. The stored step is 16999.

Training Summary

  • Base model: protenix_base_default_v1.0.0
  • Fine-tuning data: local RNA fine-tune split from outputs/rna_finetune_full
  • Validation split size: 478 PDB IDs
  • Training crop size: 384 tokens
  • Validation max tokens: 768
  • RNA MSA: enabled
  • Protein MSA and templates: disabled
  • EMA decay: 0.999
  • Optimizer for the current continuation: Aurora
  • Selection metric: rna_finetune_val/ema0.999_lddt/complex/best.avg, maximize
  • Current selection metric value: 0.772730 at step 16,999
  • Full eval settings: seed 42, bf16, N_sample=5, N_step=20, N_cycle=4, max_n_token=768
  • Full eval size after token filtering: 1,490 target rows from 195 PDB IDs

Full RNA Evaluation

Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.

The full comparison table below was produced for the previous step-12,999 checkpoint. The current step-16,999 checkpoint has updated validation metrics in checkpoint_info.json; the long full comparison has not been rerun yet.

Metric Base Protenix Protenix-RNA Delta
lDDT complex best 0.5565 0.7559 +0.1994
lDDT complex mean 0.5428 0.7434 +0.2005
lDDT complex rank1 0.5423 0.7424 +0.2001
TM-score complex best 0.8413 0.9272 +0.0859
TM-score complex rank1 0.8198 0.9140 +0.0942
TM-score C1' best 0.4611 0.6209 +0.1599
TM-score C1' rank1 0.4235 0.5916 +0.1681
pLDDT rank1 69.06 80.83 +11.77
Loss 1244.81 834.44 -410.36

These values come from a full comparison run against protenix_base_default_v1.0.0 using the same RNA validation setup and saved predictions for the step-12,999 checkpoint.

Full RNA validation TM-score and lDDT comparison

Full RNA validation pLDDT comparison

Top-100 Structure Comparison

The top-100 set is selected from the full Protenix-RNA eval by tm_score_c1prime_best, then matched against base-model predictions from the same validation set. This subset is useful for inspecting best-case RNA behavior; it is not an unbiased dataset average.

Top-100 metric Base Protenix Protenix-RNA Delta
TM-score C1' best 0.8082 0.9904 +0.1822
TM-score C1' rank1 0.7768 0.9602 +0.1833
TM-score complex best 0.9332 0.9848 +0.0516
TM-score complex rank1 0.9297 0.9806 +0.0509
lDDT complex best 0.7749 0.9211 +0.1463
lDDT complex rank1 0.7717 0.9176 +0.1459
pLDDT rank1 86.88 91.85 +4.96

Top-100 RNA target metric comparison

The following PyMOL-rendered collage shows rank-1 predicted structures from representative top-100 targets, colored by atom pLDDT stored in the mmCIF B-factor field.

RNA structure pLDDT collage

Checkpoint Selection Trace

This checkpoint was selected from the EMA validation loop by lDDT-complex best at step 16,999.

Metric Base Protenix Prior FT s9499 Previous s12999 Current s16999 Gain vs s12999
lDDT best 0.5558 0.7395 0.7587 0.7727 +0.0141
lDDT mean 0.5420 0.7261 0.7463 0.7613 +0.0150
lDDT rank1 0.5417 0.7254 0.7467 0.7614 +0.0146

RNA validation lDDT during fine-tuning

Usage

Download the checkpoint and point Protenix at it with --load_params_only true:

hf download LiteFold/protenix-rna \
  checkpoints/best_ema_0.999.pt \
  --local-dir ./protenix-rna

Example evaluation invocation inside the Protenix checkout:

LOAD_CHECKPOINT_PATH=./protenix-rna/checkpoints/best_ema_0.999.pt \
VAL_MAX_N_TOKEN=768 \
VAL_LIMIT=-1 \
N_SAMPLE=5 \
N_STEP=20 \
N_CYCLE=4 \
./run_rna_latest_full_eval_tm_dump.sh

For direct loading:

import torch

ckpt = torch.load("checkpoints/best_ema_0.999.pt", map_location="cpu", weights_only=False)
state_dict = ckpt["model"]
step = ckpt["step"]

Limitations

This is a research checkpoint specialized for the RNA fine-tuning setup above. It has not been converted into a standalone Transformers model and should be evaluated with the same Protenix code/configuration family used for training.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LiteFold/protenix-rna

Evaluation results

  • lDDT complex best on RNA fine-tune validation split
    self-reported
    0.773
  • lDDT complex mean on RNA fine-tune validation split
    self-reported
    0.761
  • lDDT complex rank1 on RNA fine-tune validation split
    self-reported
    0.761
  • TM-score complex best on RNA fine-tune validation split
    self-reported
    0.937
  • TM-score complex rank1 on RNA fine-tune validation split
    self-reported
    0.926
  • TM-score C1' best on RNA fine-tune validation split
    self-reported
    0.639
  • TM-score C1' rank1 on RNA fine-tune validation split
    self-reported
    0.599