File size: 4,200 Bytes
9627ce0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# PFMBench

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

> **PFMBench**: A comprehensive Protein Foundation Model Benchmark suite.

---

## πŸ” Overview

PFMBench is a unified benchmark suite for evaluating Protein Foundation Models (PFMs) across dozens of downstream tasks. It supports both fine-tuning on labeled data and zero-shot evaluation, and is built on top of Hydra + PyTorch Lightning for maximum flexibility and reproducibility.

---

## 🌟 Features

* **38 downstream tasks** covering structure, function, localization, mutagenesis, interaction, solubility, production, and zero-shot settings.
* **17 pre-trained models** spanning sequence-only, structure-augmented, function-aware, and multimodal PFMs.
* **PEFT support**: Adapter, LoRA, AdaLoRA, DoRA, IA3, etc.
* **Zero-shot recipes**: MSA-based, protein language model, ProteinGym protocols.
* **Modular design**: Easily swap datasets, models, tuning methods, and evaluation metrics.
* **Logging & visualization** via Weights & Biases; built-in plotting in `output_model_plots/`.

---

## πŸ“¦ Installation

```bash
# Clone the repo
git clone https://github.com/biomap-research/PFMBench.git
cd PFMBench

# Install Python dependencies
conda env create -f environment.yml

# Or you can use our Docker image via: docker pull whwendell/pfmbench:latest
```

---

## πŸ—‚οΈ Project Structure

```
PFMBench/
β”œβ”€β”€ output_model_plots/      # Generated plots (scTM, diversity, etc.)
β”œβ”€β”€ src/                     # Core library
β”‚   β”œβ”€β”€ data/                # dataset loaders & preprocessors
β”‚   β”œβ”€β”€ interface/           # generic task & model interface classes
β”‚   β”œβ”€β”€ model/               # model wrappers & PEFT adapters
β”‚   β”œβ”€β”€ utils/               # common utilities (metrics, logging, etc.)
β”‚   └── __init__.py
β”œβ”€β”€ tasks/                   # Fine-tuning experiments
β”‚   β”œβ”€β”€ configs/             # Hydra config files
β”‚   β”œβ”€β”€ results/             # Checkpoints & logs
β”‚   β”œβ”€β”€ data_interface.py    # task-specific data loader
β”‚   β”œβ”€β”€ model_interface.py   # task-specific model wrapper
β”‚   β”œβ”€β”€ main.py              # entrypoint for training/eval
β”‚   β”œβ”€β”€ tuner.py             # hyperparameter-search helper
β”‚   └── __init__.py
β”œβ”€β”€ wandb/                   # Weights & Biases scratch dir
β”œβ”€β”€ zeroshot/                # Zero-shot pipelines
β”‚   β”œβ”€β”€ msa/                 # MSA-based scoring
β”‚   β”œβ”€β”€ pglm/                # protein-LM zero-shot
β”‚   β”œβ”€β”€ saprot/              # ProteinGym protocol
β”‚   β”œβ”€β”€ data_interface.py    # generic zero-shot data loader
β”‚   β”œβ”€β”€ model_interface.py   # generic zero-shot model wrapper
β”‚   β”œβ”€β”€ msa_kl_light.py      # light MSA KL-div zero-shot
β”‚   β”œβ”€β”€ msa_kl_light copy.py # (backupβ€”can remove)
β”‚   └── proteingym_light.py  # light ProteinGym zero-shot
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
β”œβ”€β”€ environment.yml
└── README.md
```

---

## πŸš€ Quick Start

### Fine-tuning a single task

```bash
# Example: run fine-tuning with specific GPU and configs
env CUDA_VISIBLE_DEVICES=0 \
    python tasks/main.py \
    --config_name binding_db \
    --pretrain_model_name esm2_35m \
    --offline 0
```

### Zero-shot evaluation

```bash
# Example: run zero-shot MSA KL-div scoring
env CUDA_VISIBLE_DEVICES=0 \
    python zeroshot/msa_kl_light.py \
    --config_name zero_msa_kl \
    --pretrain_model_name esm2_35m \
    --offline 0
```

> Replace `--config_name`, `--pretrain_model_name`, and `--offline` flags as needed.

---

## πŸ–ΌοΈ Architecture Diagram
![PFMBench Framework](./fig/framework.png)

---

## πŸ“– Citation

If you use PFMBench in your work, please cite:

```bibtex
@article{gao2025pfmbench,
  title={PFMBench: Protein Foundation Model Benchmark},
  author={Gao, Zhangyang and Wang, Hao and Tan, Cheng and Xu, Chenrui and Liu, Mengdi and Hu, Bozhen and Chao, Linlin and Zhang, Xiaoming and Li, Stan Z},
  journal={arXiv preprint arXiv:2506.14796},
  year={2025}
}
```

---

## πŸ“ License

This project is licensed under the [Apache License 2.0](LICENSE).