Upload Aurora step 16999 best EMA checkpoint
Browse files- README.md +20 -21
- checkpoint_info.json +17 -5
- checkpoints/best_ema_0.999.pt +2 -2
- config.yaml +14 -4
README.md
CHANGED
|
@@ -22,41 +22,36 @@ model-index:
|
|
| 22 |
metrics:
|
| 23 |
- type: lddt_complex_best
|
| 24 |
name: lDDT complex best
|
| 25 |
-
value: 0.
|
| 26 |
- type: lddt_complex_mean
|
| 27 |
name: lDDT complex mean
|
| 28 |
-
value: 0.
|
| 29 |
- type: lddt_complex_rank1
|
| 30 |
name: lDDT complex rank1
|
| 31 |
-
value: 0.
|
| 32 |
- type: tm_score_complex_best
|
| 33 |
name: TM-score complex best
|
| 34 |
-
value: 0.
|
| 35 |
- type: tm_score_complex_rank1
|
| 36 |
name: TM-score complex rank1
|
| 37 |
-
value: 0.
|
| 38 |
- type: tm_score_c1prime_best
|
| 39 |
name: TM-score C1' best
|
| 40 |
-
value: 0.
|
| 41 |
- type: tm_score_c1prime_rank1
|
| 42 |
name: TM-score C1' rank1
|
| 43 |
-
value: 0.
|
| 44 |
-
- type: plddt_rank1
|
| 45 |
-
name: pLDDT rank1
|
| 46 |
-
value: 80.828687
|
| 47 |
---
|
| 48 |
|
| 49 |
# Protenix-RNA
|
| 50 |
|
| 51 |
-
Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction.
|
| 52 |
-
|
| 53 |
-

|
| 54 |
|
| 55 |
## Files
|
| 56 |
|
| 57 |
| File | Description |
|
| 58 |
|---|---|
|
| 59 |
-
| `checkpoints/best_ema_0.999.pt` | EMA checkpoint selected at step
|
| 60 |
| `config.yaml` | Resolved fine-tuning/evaluation config. |
|
| 61 |
| `validation_comparison.csv` | lDDT-only validation comparison against the base and previous fine-tuned checkpoints. |
|
| 62 |
| `eval/full_eval_base_vs_best_summary.csv` | Full validation aggregate metrics for Protenix-RNA vs base Protenix. |
|
|
@@ -66,7 +61,7 @@ Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA struc
|
|
| 66 |
| `checkpoint_info.json` | Source path, checkpoint step, and artifact metadata. |
|
| 67 |
| `figures/` | Validation, TM-score, pLDDT, and structure-collage plots. |
|
| 68 |
|
| 69 |
-
The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `model`, `optimizer`, `scheduler`, and `step`. The stored step is `
|
| 70 |
|
| 71 |
## Training Summary
|
| 72 |
|
|
@@ -78,7 +73,9 @@ The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `
|
|
| 78 |
- RNA MSA: enabled
|
| 79 |
- Protein MSA and templates: disabled
|
| 80 |
- EMA decay: 0.999
|
|
|
|
| 81 |
- Selection metric: `rna_finetune_val/ema0.999_lddt/complex/best.avg`, maximize
|
|
|
|
| 82 |
- Full eval settings: seed 42, bf16, `N_sample=5`, `N_step=20`, `N_cycle=4`, `max_n_token=768`
|
| 83 |
- Full eval size after token filtering: 1,490 target rows from 195 PDB IDs
|
| 84 |
|
|
@@ -86,6 +83,8 @@ The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `
|
|
| 86 |
|
| 87 |
Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.
|
| 88 |
|
|
|
|
|
|
|
| 89 |
| Metric | Base Protenix | Protenix-RNA | Delta |
|
| 90 |
|---|---:|---:|---:|
|
| 91 |
| lDDT complex best | 0.5565 | 0.7559 | +0.1994 |
|
|
@@ -98,7 +97,7 @@ Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.
|
|
| 98 |
| pLDDT rank1 | 69.06 | 80.83 | +11.77 |
|
| 99 |
| Loss | 1244.81 | 834.44 | -410.36 |
|
| 100 |
|
| 101 |
-
These values come from a full comparison run against `protenix_base_default_v1.0.0` using the same RNA validation setup and saved predictions for
|
| 102 |
|
| 103 |

|
| 104 |
|
|
@@ -126,13 +125,13 @@ The following PyMOL-rendered collage shows rank-1 predicted structures from repr
|
|
| 126 |
|
| 127 |
## Checkpoint Selection Trace
|
| 128 |
|
| 129 |
-
This checkpoint was
|
| 130 |
|
| 131 |
-
| Metric | Base Protenix | Prior FT s9499 |
|
| 132 |
|---|---:|---:|---:|---:|---:|
|
| 133 |
-
| lDDT best | 0.5558 | 0.7395 | 0.7587 |
|
| 134 |
-
| lDDT mean | 0.5420 | 0.7261 | 0.7463 |
|
| 135 |
-
| lDDT rank1 | 0.5417 | 0.7254 | 0.7467 |
|
| 136 |
|
| 137 |

|
| 138 |
|
|
|
|
| 22 |
metrics:
|
| 23 |
- type: lddt_complex_best
|
| 24 |
name: lDDT complex best
|
| 25 |
+
value: 0.772730
|
| 26 |
- type: lddt_complex_mean
|
| 27 |
name: lDDT complex mean
|
| 28 |
+
value: 0.761272
|
| 29 |
- type: lddt_complex_rank1
|
| 30 |
name: lDDT complex rank1
|
| 31 |
+
value: 0.761392
|
| 32 |
- type: tm_score_complex_best
|
| 33 |
name: TM-score complex best
|
| 34 |
+
value: 0.937089
|
| 35 |
- type: tm_score_complex_rank1
|
| 36 |
name: TM-score complex rank1
|
| 37 |
+
value: 0.926410
|
| 38 |
- type: tm_score_c1prime_best
|
| 39 |
name: TM-score C1' best
|
| 40 |
+
value: 0.639263
|
| 41 |
- type: tm_score_c1prime_rank1
|
| 42 |
name: TM-score C1' rank1
|
| 43 |
+
value: 0.599042
|
|
|
|
|
|
|
|
|
|
| 44 |
---
|
| 45 |
|
| 46 |
# Protenix-RNA
|
| 47 |
|
| 48 |
+
Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction. The current checkpoint was selected by the EMA validation lDDT-complex best metric at training step 16,999 and is distributed as a native Protenix checkpoint for the Protenix codebase, not as a `transformers.AutoModel` package.
|
|
|
|
|
|
|
| 49 |
|
| 50 |
## Files
|
| 51 |
|
| 52 |
| File | Description |
|
| 53 |
|---|---|
|
| 54 |
+
| `checkpoints/best_ema_0.999.pt` | EMA checkpoint selected at step 16,999. |
|
| 55 |
| `config.yaml` | Resolved fine-tuning/evaluation config. |
|
| 56 |
| `validation_comparison.csv` | lDDT-only validation comparison against the base and previous fine-tuned checkpoints. |
|
| 57 |
| `eval/full_eval_base_vs_best_summary.csv` | Full validation aggregate metrics for Protenix-RNA vs base Protenix. |
|
|
|
|
| 61 |
| `checkpoint_info.json` | Source path, checkpoint step, and artifact metadata. |
|
| 62 |
| `figures/` | Validation, TM-score, pLDDT, and structure-collage plots. |
|
| 63 |
|
| 64 |
+
The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `model`, `optimizer`, `scheduler`, and `step`. The stored step is `16999`.
|
| 65 |
|
| 66 |
## Training Summary
|
| 67 |
|
|
|
|
| 73 |
- RNA MSA: enabled
|
| 74 |
- Protein MSA and templates: disabled
|
| 75 |
- EMA decay: 0.999
|
| 76 |
+
- Optimizer for the current continuation: Aurora
|
| 77 |
- Selection metric: `rna_finetune_val/ema0.999_lddt/complex/best.avg`, maximize
|
| 78 |
+
- Current selection metric value: 0.772730 at step 16,999
|
| 79 |
- Full eval settings: seed 42, bf16, `N_sample=5`, `N_step=20`, `N_cycle=4`, `max_n_token=768`
|
| 80 |
- Full eval size after token filtering: 1,490 target rows from 195 PDB IDs
|
| 81 |
|
|
|
|
| 83 |
|
| 84 |
Higher is better for lDDT, TM-score, and pLDDT. Lower is better for loss.
|
| 85 |
|
| 86 |
+
The full comparison table below was produced for the previous step-12,999 checkpoint. The current step-16,999 checkpoint has updated validation metrics in `checkpoint_info.json`; the long full comparison has not been rerun yet.
|
| 87 |
+
|
| 88 |
| Metric | Base Protenix | Protenix-RNA | Delta |
|
| 89 |
|---|---:|---:|---:|
|
| 90 |
| lDDT complex best | 0.5565 | 0.7559 | +0.1994 |
|
|
|
|
| 97 |
| pLDDT rank1 | 69.06 | 80.83 | +11.77 |
|
| 98 |
| Loss | 1244.81 | 834.44 | -410.36 |
|
| 99 |
|
| 100 |
+
These values come from a full comparison run against `protenix_base_default_v1.0.0` using the same RNA validation setup and saved predictions for the step-12,999 checkpoint.
|
| 101 |
|
| 102 |

|
| 103 |
|
|
|
|
| 125 |
|
| 126 |
## Checkpoint Selection Trace
|
| 127 |
|
| 128 |
+
This checkpoint was selected from the EMA validation loop by lDDT-complex best at step 16,999.
|
| 129 |
|
| 130 |
+
| Metric | Base Protenix | Prior FT s9499 | Previous s12999 | Current s16999 | Gain vs s12999 |
|
| 131 |
|---|---:|---:|---:|---:|---:|
|
| 132 |
+
| lDDT best | 0.5558 | 0.7395 | 0.7587 | 0.7727 | +0.0141 |
|
| 133 |
+
| lDDT mean | 0.5420 | 0.7261 | 0.7463 | 0.7613 | +0.0150 |
|
| 134 |
+
| lDDT rank1 | 0.5417 | 0.7254 | 0.7467 | 0.7614 | +0.0146 |
|
| 135 |
|
| 136 |

|
| 137 |
|
checkpoint_info.json
CHANGED
|
@@ -1,16 +1,28 @@
|
|
| 1 |
{
|
| 2 |
"checkpoint_name": "best_ema_0.999.pt",
|
| 3 |
"repo_id": "LiteFold/protenix-rna",
|
| 4 |
-
"source_path": "output/
|
| 5 |
"path_in_repo": "checkpoints/best_ema_0.999.pt",
|
| 6 |
-
"size_bytes":
|
|
|
|
| 7 |
"checkpoint_keys": ["model", "optimizer", "scheduler", "step"],
|
| 8 |
-
"step":
|
| 9 |
"ema_decay": 0.999,
|
| 10 |
"base_model": "protenix_base_default_v1.0.0",
|
|
|
|
| 11 |
"selection_metric": "rna_finetune_val/ema0.999_lddt/complex/best.avg",
|
| 12 |
"selection_metric_mode": "max",
|
| 13 |
-
"selection_metric_value": 0.
|
| 14 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
"created_from_workspace": "/lambda/nfs/research/Protenix"
|
| 16 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"checkpoint_name": "best_ema_0.999.pt",
|
| 3 |
"repo_id": "LiteFold/protenix-rna",
|
| 4 |
+
"source_path": "output/protenix_rna_resume_aurora_s13000_to_s28274_20260523_121652/checkpoints/best_ema_0.999.pt",
|
| 5 |
"path_in_repo": "checkpoints/best_ema_0.999.pt",
|
| 6 |
+
"size_bytes": 2954950735,
|
| 7 |
+
"sha256": "c5d4dbf2fc412ec06bc7763247278dde832f445448c5d5af8b64621942dacd8a",
|
| 8 |
"checkpoint_keys": ["model", "optimizer", "scheduler", "step"],
|
| 9 |
+
"step": 16999,
|
| 10 |
"ema_decay": 0.999,
|
| 11 |
"base_model": "protenix_base_default_v1.0.0",
|
| 12 |
+
"optimizer": "aurora",
|
| 13 |
"selection_metric": "rna_finetune_val/ema0.999_lddt/complex/best.avg",
|
| 14 |
"selection_metric_mode": "max",
|
| 15 |
+
"selection_metric_value": 0.7727303531689521,
|
| 16 |
+
"validation_metrics": {
|
| 17 |
+
"rna_finetune_val/ema0.999_lddt/complex/best.avg": 0.7727303531689521,
|
| 18 |
+
"rna_finetune_val/ema0.999_lddt/complex/mean.avg": 0.7612718707887378,
|
| 19 |
+
"rna_finetune_val/ema0.999_lddt/complex/plddt.rank1.avg": 0.7613916325381052,
|
| 20 |
+
"rna_finetune_val/ema0.999_tm_score/complex/best.avg": 0.9370890020501214,
|
| 21 |
+
"rna_finetune_val/ema0.999_tm_score/complex/plddt.rank1.avg": 0.9264103662097712,
|
| 22 |
+
"rna_finetune_val/ema0.999_tm_score/c1prime/best.avg": 0.6392630664815417,
|
| 23 |
+
"rna_finetune_val/ema0.999_tm_score/c1prime/plddt.rank1.avg": 0.5990416565579335,
|
| 24 |
+
"rna_finetune_val/ema0.999_loss.avg": 372.15375346051167
|
| 25 |
+
},
|
| 26 |
+
"local_run_dir": "output/protenix_rna_resume_aurora_s13000_to_s28274_20260523_121652",
|
| 27 |
"created_from_workspace": "/lambda/nfs/research/Protenix"
|
| 28 |
}
|
checkpoints/best_ema_0.999.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c5d4dbf2fc412ec06bc7763247278dde832f445448c5d5af8b64621942dacd8a
|
| 3 |
+
size 2954950735
|
config.yaml
CHANGED
|
@@ -17,6 +17,15 @@ atom_permutation:
|
|
| 17 |
train:
|
| 18 |
diffusion_sample: false
|
| 19 |
mini_rollout: true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
base_dir: ./output
|
| 21 |
best_checkpoint_metric: rna_finetune_val/ema0.999_lddt/complex/best.avg
|
| 22 |
best_checkpoint_mode: max
|
|
@@ -605,8 +614,8 @@ inference_noise_scheduler:
|
|
| 605 |
sigma_data: 16.0
|
| 606 |
iters_to_accumulate: 1
|
| 607 |
latest_checkpoint_name: latest
|
| 608 |
-
load_checkpoint_path: output/
|
| 609 |
-
load_ema_checkpoint_path: output/
|
| 610 |
load_params_only: false
|
| 611 |
load_step_for_scheduler: false
|
| 612 |
load_strict: true
|
|
@@ -666,7 +675,7 @@ loss_metrics_sparse_enable: true
|
|
| 666 |
lr: 5.0e-05
|
| 667 |
lr_scheduler: af3
|
| 668 |
max_atoms_per_token: 24
|
| 669 |
-
max_steps:
|
| 670 |
mc_dropout_apply_rate: 0.4
|
| 671 |
mc_dropout_rate: 0.4
|
| 672 |
metrics:
|
|
@@ -785,9 +794,10 @@ model:
|
|
| 785 |
model_name: protenix_base_default_v1.0.0
|
| 786 |
n_blocks: 48
|
| 787 |
no_bins: 64
|
|
|
|
| 788 |
overwrite_checkpoints: true
|
| 789 |
project: protenix
|
| 790 |
-
run_name:
|
| 791 |
sample_diffusion:
|
| 792 |
N_sample: 5
|
| 793 |
N_sample_mini_rollout: 1
|
|
|
|
| 17 |
train:
|
| 18 |
diffusion_sample: false
|
| 19 |
mini_rollout: true
|
| 20 |
+
aurora:
|
| 21 |
+
adam_eps: 1.0e-08
|
| 22 |
+
check_finite: false
|
| 23 |
+
eps: 1.0e-07
|
| 24 |
+
momentum: 0.95
|
| 25 |
+
nesterov: true
|
| 26 |
+
pp_beta: 0.5
|
| 27 |
+
pp_iterations: 2
|
| 28 |
+
weight_decay: 0.025
|
| 29 |
base_dir: ./output
|
| 30 |
best_checkpoint_metric: rna_finetune_val/ema0.999_lddt/complex/best.avg
|
| 31 |
best_checkpoint_mode: max
|
|
|
|
| 614 |
sigma_data: 16.0
|
| 615 |
iters_to_accumulate: 1
|
| 616 |
latest_checkpoint_name: latest
|
| 617 |
+
load_checkpoint_path: output/protenix_rna_resume_opt_b32_lr5e5_s9500_to_s20000_20260522_231945/checkpoints/latest.pt
|
| 618 |
+
load_ema_checkpoint_path: output/protenix_rna_resume_opt_b32_lr5e5_s9500_to_s20000_20260522_231945/checkpoints/latest_ema_0.999.pt
|
| 619 |
load_params_only: false
|
| 620 |
load_step_for_scheduler: false
|
| 621 |
load_strict: true
|
|
|
|
| 675 |
lr: 5.0e-05
|
| 676 |
lr_scheduler: af3
|
| 677 |
max_atoms_per_token: 24
|
| 678 |
+
max_steps: 28274
|
| 679 |
mc_dropout_apply_rate: 0.4
|
| 680 |
mc_dropout_rate: 0.4
|
| 681 |
metrics:
|
|
|
|
| 794 |
model_name: protenix_base_default_v1.0.0
|
| 795 |
n_blocks: 48
|
| 796 |
no_bins: 64
|
| 797 |
+
optimizer: aurora
|
| 798 |
overwrite_checkpoints: true
|
| 799 |
project: protenix
|
| 800 |
+
run_name: protenix_rna_resume_aurora_s13000_to_s28274
|
| 801 |
sample_diffusion:
|
| 802 |
N_sample: 5
|
| 803 |
N_sample_mini_rollout: 1
|