anindya64 commited on
Commit
0c1f4e3
·
verified ·
1 Parent(s): 9f70d6b

Update Protenix-RNA model card and validation figures

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/validation_lddt_curve.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  datasets:
12
  - LiteFold/PDB
13
  model-index:
14
- - name: protenix-rna-finetune-ema-s12999
15
  results:
16
  - task:
17
  type: structure-prediction
@@ -23,26 +23,27 @@ model-index:
23
  - type: lddt_complex_best
24
  name: lDDT complex best
25
  value: 0.758663
 
 
 
26
  - type: lddt_complex_rank1
27
  name: lDDT complex rank1
28
  value: 0.746743
29
- - type: validation_loss
30
- name: Validation loss
31
- value: 411.541803
32
  ---
33
 
34
- # Protenix RNA Fine-Tune EMA S12999
35
 
36
- This repository contains a Protenix RNA fine-tuned PyTorch checkpoint selected from the EMA validation metric at training step 12,999. It is intended for use with the Protenix codebase rather than `transformers.AutoModel`.
37
 
38
  ## Files
39
 
40
  | File | Description |
41
  |---|---|
42
- | `checkpoints/best_ema_0.999.pt` | Uploaded checkpoint from `output/protenix_rna_resume_opt_b32_lr5e5_s9500_to_s20000_20260522_231945/checkpoints/best_ema_0.999.pt`. |
43
- | `config.yaml` | Resolved config for the fine-tuning/evaluation run. |
44
- | `validation_comparison.csv` | Base vs fine-tuned validation metrics. |
45
- | `checkpoint_info.json` | Local source path, checkpoint step, and artifact metadata. |
 
46
 
47
  The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `model`, `optimizer`, `scheduler`, and `step`. The stored step is `12999`.
48
 
@@ -58,38 +59,38 @@ The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `
58
  - EMA decay: 0.999
59
  - Selection metric: `rna_finetune_val/ema0.999_lddt/complex/best.avg`, maximize
60
 
61
- ## Validation Comparison
62
 
63
- Higher is better for lDDT metrics. Lower is better for loss metrics.
64
 
65
- | Metric | Base | Prior fine-tune s9499 | Uploaded EMA s12999 | Delta vs base | Delta vs s9499 |
66
  |---|---:|---:|---:|---:|---:|
67
- | loss | 1249.9014 | 890.1429 | 411.5418 | -838.3596 | -478.6011 |
68
- | weighted_mse | 1247.7881 | 888.5646 | 410.0376 | -837.7506 | -478.5270 |
69
- | mse | 311.9470 | 222.1411 | 102.5094 | -209.4376 | -119.6318 |
70
- | smooth_lddt_loss | 0.5282 | 0.3945 | 0.3759 | -0.1522 | -0.0186 |
71
- | lddt_best | 0.5558 | 0.7395 | 0.7587 | +0.2029 | +0.0192 |
72
- | lddt_mean | 0.5420 | 0.7261 | 0.7463 | +0.2043 | +0.0202 |
73
- | lddt_rank1 | 0.5417 | 0.7254 | 0.7467 | +0.2050 | +0.0214 |
74
- | pde | 2.8927 | 1.9580 | 2.1069 | -0.7858 | +0.1489 |
75
- | pae | 2.8719 | 3.6604 | 3.8774 | +1.0055 | +0.2170 |
76
-
77
- Validation settings: RNA validation split, seed 42, bf16, `N_sample=5`, `N_step=20`, `N_cycle=4`, `max_n_token=768`, RNA MSA enabled. The uploaded step 12,999 values come from the EMA validation loop that produced the best checkpoint. The base and prior fine-tune rows come from the standalone full-validation runs in this workspace.
78
 
79
  ## Usage
80
 
81
  Download the checkpoint and point Protenix at it with `--load_params_only true`:
82
 
83
  ```bash
84
- hf download LiteFold/protenix-rna-finetune-ema-s12999 \
85
  checkpoints/best_ema_0.999.pt \
86
- --local-dir ./protenix-rna-finetune-ema-s12999
87
  ```
88
 
89
  Example evaluation invocation inside the Protenix checkout:
90
 
91
  ```bash
92
- LOAD_CHECKPOINT_PATH=./protenix-rna-finetune-ema-s12999/checkpoints/best_ema_0.999.pt \
93
  VAL_MAX_N_TOKEN=768 \
94
  VAL_LIMIT=-1 \
95
  N_SAMPLE=5 \
@@ -110,4 +111,4 @@ step = ckpt["step"]
110
 
111
  ## Limitations
112
 
113
- This is a research checkpoint specialized for the local RNA fine-tuning setup. It has not been packaged as a standalone Transformers model and should be evaluated with the same Protenix code/configuration family used for training.
 
11
  datasets:
12
  - LiteFold/PDB
13
  model-index:
14
+ - name: protenix-rna
15
  results:
16
  - task:
17
  type: structure-prediction
 
23
  - type: lddt_complex_best
24
  name: lDDT complex best
25
  value: 0.758663
26
+ - type: lddt_complex_mean
27
+ name: lDDT complex mean
28
+ value: 0.746286
29
  - type: lddt_complex_rank1
30
  name: lDDT complex rank1
31
  value: 0.746743
 
 
 
32
  ---
33
 
34
+ # Protenix-RNA
35
 
36
+ Protenix-RNA is a Protenix fine-tuned PyTorch checkpoint optimized for RNA structure prediction. It was selected by the EMA validation lDDT-complex best metric at training step 12,999 and is distributed as a native Protenix checkpoint for the Protenix codebase, not as a `transformers.AutoModel` package.
37
 
38
  ## Files
39
 
40
  | File | Description |
41
  |---|---|
42
+ | `checkpoints/best_ema_0.999.pt` | EMA checkpoint selected at step 12,999. |
43
+ | `config.yaml` | Resolved fine-tuning/evaluation config. |
44
+ | `validation_comparison.csv` | lDDT-only validation comparison against the base and previous fine-tuned checkpoints. |
45
+ | `checkpoint_info.json` | Source path, checkpoint step, and artifact metadata. |
46
+ | `figures/` | Validation comparison and lDDT progression plots. |
47
 
48
  The checkpoint is a `torch.load(..., weights_only=False)` dictionary with keys `model`, `optimizer`, `scheduler`, and `step`. The stored step is `12999`.
49
 
 
59
  - EMA decay: 0.999
60
  - Selection metric: `rna_finetune_val/ema0.999_lddt/complex/best.avg`, maximize
61
 
62
+ ## Validation
63
 
64
+ Higher is better for all metrics shown here.
65
 
66
+ | Metric | Base Protenix | Prior FT s9499 | Protenix-RNA s12999 | Gain vs base | Gain vs s9499 |
67
  |---|---:|---:|---:|---:|---:|
68
+ | lDDT best | 0.5558 | 0.7395 | 0.7587 | +0.2029 | +0.0192 |
69
+ | lDDT mean | 0.5420 | 0.7261 | 0.7463 | +0.2043 | +0.0202 |
70
+ | lDDT rank1 | 0.5417 | 0.7254 | 0.7467 | +0.2050 | +0.0214 |
71
+
72
+ Validation settings: RNA validation split, seed 42, bf16, `N_sample=5`, `N_step=20`, `N_cycle=4`, `max_n_token=768`, RNA MSA enabled. The step 12,999 values come from the EMA validation loop that produced the uploaded checkpoint.
73
+
74
+ ![RNA validation lDDT comparison](figures/lddt_comparison.png)
75
+
76
+ ![Uploaded checkpoint lDDT gain](figures/lddt_gain.png)
77
+
78
+ ![RNA validation lDDT during fine-tuning](figures/validation_lddt_curve.png)
79
 
80
  ## Usage
81
 
82
  Download the checkpoint and point Protenix at it with `--load_params_only true`:
83
 
84
  ```bash
85
+ hf download LiteFold/protenix-rna \
86
  checkpoints/best_ema_0.999.pt \
87
+ --local-dir ./protenix-rna
88
  ```
89
 
90
  Example evaluation invocation inside the Protenix checkout:
91
 
92
  ```bash
93
+ LOAD_CHECKPOINT_PATH=./protenix-rna/checkpoints/best_ema_0.999.pt \
94
  VAL_MAX_N_TOKEN=768 \
95
  VAL_LIMIT=-1 \
96
  N_SAMPLE=5 \
 
111
 
112
  ## Limitations
113
 
114
+ This is a research checkpoint specialized for the RNA fine-tuning setup above. It has not been converted into a standalone Transformers model and should be evaluated with the same Protenix code/configuration family used for training.
checkpoint_info.json CHANGED
@@ -1,5 +1,6 @@
1
  {
2
  "checkpoint_name": "best_ema_0.999.pt",
 
3
  "source_path": "output/protenix_rna_resume_opt_b32_lr5e5_s9500_to_s20000_20260522_231945/checkpoints/best_ema_0.999.pt",
4
  "path_in_repo": "checkpoints/best_ema_0.999.pt",
5
  "size_bytes": 4427468333,
 
1
  {
2
  "checkpoint_name": "best_ema_0.999.pt",
3
+ "repo_id": "LiteFold/protenix-rna",
4
  "source_path": "output/protenix_rna_resume_opt_b32_lr5e5_s9500_to_s20000_20260522_231945/checkpoints/best_ema_0.999.pt",
5
  "path_in_repo": "checkpoints/best_ema_0.999.pt",
6
  "size_bytes": 4427468333,
figures/lddt_comparison.png ADDED
figures/lddt_gain.png ADDED
figures/validation_lddt_curve.png ADDED

Git LFS Details

  • SHA256: c9bb808bd8cc4020115fdbded3c5af5888a5256996da3ab61596444e1c445a96
  • Pointer size: 131 Bytes
  • Size of remote file: 110 kB
validation_comparison.csv CHANGED
@@ -1,10 +1,4 @@
1
- metric,base_default_v1,prior_finetune_ema_s9499,uploaded_ema_s12999,delta_s12999_vs_base,delta_s12999_vs_s9499,higher_is_better
2
- loss,1249.901394,890.142949,411.541803,-838.359591,-478.601146,false
3
- weighted_mse,1247.788115,888.564566,410.037558,-837.750556,-478.527007,false
4
- mse,311.947029,222.141141,102.509390,-209.437639,-119.631752,false
5
- smooth_lddt_loss,0.528177,0.394495,0.375942,-0.152235,-0.018554,false
6
  lddt_best,0.555753,0.739509,0.758663,0.202910,0.019154,true
7
  lddt_mean,0.541968,0.726095,0.746286,0.204318,0.020192,true
8
  lddt_rank1,0.541723,0.725381,0.746743,0.205021,0.021363,true
9
- pde,2.892729,1.958018,2.106942,-0.785787,0.148924,false
10
- pae,2.871937,3.660412,3.877398,1.005460,0.216986,false
 
1
+ metric,base_default_v1,prior_finetune_ema_s9499,protenix_rna_ema_s12999,gain_s12999_vs_base,gain_s12999_vs_s9499,higher_is_better
 
 
 
 
2
  lddt_best,0.555753,0.739509,0.758663,0.202910,0.019154,true
3
  lddt_mean,0.541968,0.726095,0.746286,0.204318,0.020192,true
4
  lddt_rank1,0.541723,0.725381,0.746743,0.205021,0.021363,true