Protenix-RNA
Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction. The current checkpoint was selected by the EMA validation lDDT-complex best metric at training step 16,999 and is distributed as a native Protenix checkpoint for the Protenix codebase, not as a transformers.AutoModel package.
Files
| File | Description |
|---|---|
checkpoints/best_ema_0.999.pt |
EMA checkpoint selected at step 16,999. |
config.yaml |
Resolved fine-tuning/evaluation config. |
validation_comparison.csv |
lDDT-only validation comparison against the base and previous fine-tuned checkpoints. |
eval/full_eval_base_vs_best_summary.csv |
Full validation aggregate metrics for Protenix-RNA vs base Protenix. |
eval/top100_base_vs_best_summary.csv |
Aggregate metrics on the top 100 targets selected by Protenix-RNA TM-score C1' best. |
eval/top100_base_vs_best_comparison.csv |
Per-target top-100 comparison CSV with base, Protenix-RNA, and delta columns. |
eval/selected_best_top100.csv |
The selected top-100 target rows from the Protenix-RNA full eval. |
checkpoint_info.json |
Source path, checkpoint step, and artifact metadata. |
figures/ |
Validation, TM-score, pLDDT, and structure-collage plots. |
The checkpoint is a torch.load(..., weights_only=False) dictionary with keys model, optimizer, scheduler, and step. The stored step is 16999.
Training Summary
- Base model:
protenix_base_default_v1.0.0 - Fine-tuning data: local RNA fine-tune split from
outputs/rna_finetune_full - Validation split size: 478 PDB IDs
- Training crop size: 384 tokens
- Validation max tokens: 768
- RNA MSA: enabled
- Protein MSA and templates: disabled
- EMA decay: 0.999
- Optimizer for the current continuation: Aurora
- Selection metric:
rna_finetune_val/ema0.999_lddt/complex/best.avg, maximize - Current selection metric value: 0.772730 at step 16,999
- Full eval settings: seed 42, bf16,
N_sample=5,N_step=20,N_cycle=4,max_n_token=768 - Full eval size after token filtering: 1,490 target rows from 195 PDB IDs
Full RNA Evaluation
Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.
The full comparison table below was produced for the previous step-12,999 checkpoint. The current step-16,999 checkpoint has updated validation metrics in checkpoint_info.json; the long full comparison has not been rerun yet.
| Metric | Base Protenix | Protenix-RNA | Delta |
|---|---|---|---|
| lDDT complex best | 0.5565 | 0.7559 | +0.1994 |
| lDDT complex mean | 0.5428 | 0.7434 | +0.2005 |
| lDDT complex rank1 | 0.5423 | 0.7424 | +0.2001 |
| TM-score complex best | 0.8413 | 0.9272 | +0.0859 |
| TM-score complex rank1 | 0.8198 | 0.9140 | +0.0942 |
| TM-score C1' best | 0.4611 | 0.6209 | +0.1599 |
| TM-score C1' rank1 | 0.4235 | 0.5916 | +0.1681 |
| pLDDT rank1 | 69.06 | 80.83 | +11.77 |
| Loss | 1244.81 | 834.44 | -410.36 |
These values come from a full comparison run against protenix_base_default_v1.0.0 using the same RNA validation setup and saved predictions for the step-12,999 checkpoint.
Top-100 Structure Comparison
The top-100 set is selected from the full Protenix-RNA eval by tm_score_c1prime_best, then matched against base-model predictions from the same validation set. This subset is useful for inspecting best-case RNA behavior; it is not an unbiased dataset average.
| Top-100 metric | Base Protenix | Protenix-RNA | Delta |
|---|---|---|---|
| TM-score C1' best | 0.8082 | 0.9904 | +0.1822 |
| TM-score C1' rank1 | 0.7768 | 0.9602 | +0.1833 |
| TM-score complex best | 0.9332 | 0.9848 | +0.0516 |
| TM-score complex rank1 | 0.9297 | 0.9806 | +0.0509 |
| lDDT complex best | 0.7749 | 0.9211 | +0.1463 |
| lDDT complex rank1 | 0.7717 | 0.9176 | +0.1459 |
| pLDDT rank1 | 86.88 | 91.85 | +4.96 |
The following PyMOL-rendered collage shows rank-1 predicted structures from representative top-100 targets, colored by atom pLDDT stored in the mmCIF B-factor field.
Checkpoint Selection Trace
This checkpoint was selected from the EMA validation loop by lDDT-complex best at step 16,999.
| Metric | Base Protenix | Prior FT s9499 | Previous s12999 | Current s16999 | Gain vs s12999 |
|---|---|---|---|---|---|
| lDDT best | 0.5558 | 0.7395 | 0.7587 | 0.7727 | +0.0141 |
| lDDT mean | 0.5420 | 0.7261 | 0.7463 | 0.7613 | +0.0150 |
| lDDT rank1 | 0.5417 | 0.7254 | 0.7467 | 0.7614 | +0.0146 |
Usage
Download the checkpoint and point Protenix at it with --load_params_only true:
hf download LiteFold/protenix-rna \
checkpoints/best_ema_0.999.pt \
--local-dir ./protenix-rna
Example evaluation invocation inside the Protenix checkout:
LOAD_CHECKPOINT_PATH=./protenix-rna/checkpoints/best_ema_0.999.pt \
VAL_MAX_N_TOKEN=768 \
VAL_LIMIT=-1 \
N_SAMPLE=5 \
N_STEP=20 \
N_CYCLE=4 \
./run_rna_latest_full_eval_tm_dump.sh
For direct loading:
import torch
ckpt = torch.load("checkpoints/best_ema_0.999.pt", map_location="cpu", weights_only=False)
state_dict = ckpt["model"]
step = ckpt["step"]
Limitations
This is a research checkpoint specialized for the RNA fine-tuning setup above. It has not been converted into a standalone Transformers model and should be evaluated with the same Protenix code/configuration family used for training.
- Downloads last month
- 23
Dataset used to train LiteFold/protenix-rna
Evaluation results
- lDDT complex best on RNA fine-tune validation splitself-reported0.773
- lDDT complex mean on RNA fine-tune validation splitself-reported0.761
- lDDT complex rank1 on RNA fine-tune validation splitself-reported0.761
- TM-score complex best on RNA fine-tune validation splitself-reported0.937
- TM-score complex rank1 on RNA fine-tune validation splitself-reported0.926
- TM-score C1' best on RNA fine-tune validation splitself-reported0.639
- TM-score C1' rank1 on RNA fine-tune validation splitself-reported0.599




