Update README.md
Browse files
README.md
CHANGED
|
@@ -4,9 +4,9 @@ license: cc-by-nc-nd-4.0
|
|
| 4 |
|
| 5 |

|
| 6 |
|
| 7 |
-
# PeptiVerse π§¬π
|
| 8 |
|
| 9 |
-
A collection of machine learning predictors for canonical and non-canonical peptide property prediction using sequence and SMILES representations. 𧬠PeptiVerse π enables evaluation of key biophysical and therapeutic properties of peptides for property-optimized generation.
|
| 10 |
|
| 11 |
## Table of Contents
|
| 12 |
|
|
@@ -26,7 +26,7 @@ A collection of machine learning predictors for canonical and non-canonical pept
|
|
| 26 |
- [Troubleshooting](#troubleshooting-)
|
| 27 |
- [Citation](#citation-)
|
| 28 |
|
| 29 |
-
## Quick
|
| 30 |
|
| 31 |
```bash
|
| 32 |
# Clone repository
|
|
@@ -38,13 +38,13 @@ pip install -r requirements.txt
|
|
| 38 |
# Run inference
|
| 39 |
python inference.py
|
| 40 |
```
|
| 41 |
-
## Installation
|
| 42 |
-
### Minimal Setup
|
| 43 |
- Easy start-up environment (using transformers, xgboost models)
|
| 44 |
```bash
|
| 45 |
pip install -r requirements.txt
|
| 46 |
```
|
| 47 |
-
### Full Setup
|
| 48 |
- Additional access to trained SVM and ElastNet models requires installation of `RAPIDS cuML`, with instructions available from their official [github page](https://github.com/rapidsai/cuml) (**CUDA-capable GPU required**).
|
| 49 |
- Optional: pre-compiled Singularity/Apptainer environment (7.52G) is available at [Google drive](https://drive.google.com/file/d/1RJQ9HK0_gsPOhRo5H5ZmH_MYcpJqQD7e/view?usp=sharing) with everything you need (still need CUDA/GPU to load cuML models).
|
| 50 |
```
|
|
@@ -54,7 +54,7 @@ pip install -r requirements.txt
|
|
| 54 |
# run inference (see below)
|
| 55 |
apptainer exec peptiverse.sif python inference.py
|
| 56 |
```
|
| 57 |
-
## Repository
|
| 58 |
This repo contains important large files for [PeptiVerse](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse), an interactive app for peptide property prediction. [Paper link.](https://www.biorxiv.org/content/10.64898/2025.12.31.697180v1)
|
| 59 |
|
| 60 |
```
|
|
@@ -75,7 +75,7 @@ PeptiVerse/
|
|
| 75 |
βββ requirements.txt # Python dependencies
|
| 76 |
```
|
| 77 |
|
| 78 |
-
## Training Data Collection
|
| 79 |
|
| 80 |
<table>
|
| 81 |
<caption><strong>Data distribution.</strong> Classification tasks report counts for class 0/1; regression tasks report total sample size (N).</caption>
|
|
@@ -158,7 +158,7 @@ PeptiVerse/
|
|
| 158 |
</table>
|
| 159 |
|
| 160 |
|
| 161 |
-
## Best Model List
|
| 162 |
|
| 163 |
### Full model set (cuML-enabled)
|
| 164 |
| Property | Best Model (Sequence) | Best Model (SMILES) | Task Type | Threshold (Sequence) | Threshold (SMILES) |
|
|
@@ -190,7 +190,7 @@ PeptiVerse/
|
|
| 190 |
>Note: Models marked as SVM or ENET are replaced with XGB as these models are not currently supported in the deployment environment without cuML setups. *xgb_wt_log* indicated log-scaled transformation of time during training.
|
| 191 |
|
| 192 |
|
| 193 |
-
## Usage
|
| 194 |
|
| 195 |
### Local Application Hosting
|
| 196 |
- Host the [PeptiVerse UI](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse) locally with your own resources.
|
|
@@ -263,7 +263,7 @@ print("Downloaded to:", local_dir)
|
|
| 263 |
length: `int` (=L);
|
| 264 |
|
| 265 |
|
| 266 |
-
### Quick
|
| 267 |
```python
|
| 268 |
from inference import PeptiVersePredictor
|
| 269 |
|
|
@@ -352,7 +352,7 @@ print(out)
|
|
| 352 |
|
| 353 |
```
|
| 354 |
|
| 355 |
-
## Interpretation
|
| 356 |
|
| 357 |
You can also find the same description in the paper or in the PeptiVerse app `Documentation` tab.
|
| 358 |
|
|
@@ -414,7 +414,7 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
|
|
| 414 |
- A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
|
| 415 |
|
| 416 |
|
| 417 |
-
## Model Architecture
|
| 418 |
|
| 419 |
- **Sequence Embeddings:** [ESM-2 650M model](https://huggingface.co/facebook/esm2_t33_650M_UR50D) / [PeptideCLM model](https://huggingface.co/aaronfeller/PeptideCLM-23M-all). Foundational embeddings are frozen.
|
| 420 |
- **XGBoost Model:** Gradient boosting on pooled embedding features for efficient, high-performance prediction.
|
|
@@ -423,7 +423,7 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
|
|
| 423 |
- **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
|
| 424 |
- **Others:** SVM and Elastic Nets were trained with [RAPIDS cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
|
| 425 |
|
| 426 |
-
## Troubleshooting
|
| 427 |
|
| 428 |
### LFS Download Issues
|
| 429 |
|
|
@@ -438,7 +438,7 @@ huggingface-cli download ChatterjeeLab/PeptiVerse \
|
|
| 438 |
### TODOs
|
| 439 |
Bug loading transformer half-life model now, will fix soon.
|
| 440 |
|
| 441 |
-
## Citation
|
| 442 |
|
| 443 |
If you find this repository helpful for your publications, please consider citing our paper:
|
| 444 |
|
|
|
|
| 4 |
|
| 5 |

|
| 6 |
|
| 7 |
+
# PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction π§¬π
|
| 8 |
|
| 9 |
+
This is the repository for [PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction](https://www.biorxiv.org/content/10.64898/2025.12.31.697180), a collection of machine learning predictors for canonical and non-canonical peptide property prediction using sequence and SMILES representations. 𧬠PeptiVerse π enables evaluation of key biophysical and therapeutic properties of peptides for property-optimized generation.
|
| 10 |
|
| 11 |
## Table of Contents
|
| 12 |
|
|
|
|
| 26 |
- [Troubleshooting](#troubleshooting-)
|
| 27 |
- [Citation](#citation-)
|
| 28 |
|
| 29 |
+
## Quick Start
|
| 30 |
|
| 31 |
```bash
|
| 32 |
# Clone repository
|
|
|
|
| 38 |
# Run inference
|
| 39 |
python inference.py
|
| 40 |
```
|
| 41 |
+
## Installation
|
| 42 |
+
### Minimal Setup
|
| 43 |
- Easy start-up environment (using transformers, xgboost models)
|
| 44 |
```bash
|
| 45 |
pip install -r requirements.txt
|
| 46 |
```
|
| 47 |
+
### Full Setup
|
| 48 |
- Additional access to trained SVM and ElastNet models requires installation of `RAPIDS cuML`, with instructions available from their official [github page](https://github.com/rapidsai/cuml) (**CUDA-capable GPU required**).
|
| 49 |
- Optional: pre-compiled Singularity/Apptainer environment (7.52G) is available at [Google drive](https://drive.google.com/file/d/1RJQ9HK0_gsPOhRo5H5ZmH_MYcpJqQD7e/view?usp=sharing) with everything you need (still need CUDA/GPU to load cuML models).
|
| 50 |
```
|
|
|
|
| 54 |
# run inference (see below)
|
| 55 |
apptainer exec peptiverse.sif python inference.py
|
| 56 |
```
|
| 57 |
+
## Repository Structure
|
| 58 |
This repo contains important large files for [PeptiVerse](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse), an interactive app for peptide property prediction. [Paper link.](https://www.biorxiv.org/content/10.64898/2025.12.31.697180v1)
|
| 59 |
|
| 60 |
```
|
|
|
|
| 75 |
βββ requirements.txt # Python dependencies
|
| 76 |
```
|
| 77 |
|
| 78 |
+
## Training Data Collection
|
| 79 |
|
| 80 |
<table>
|
| 81 |
<caption><strong>Data distribution.</strong> Classification tasks report counts for class 0/1; regression tasks report total sample size (N).</caption>
|
|
|
|
| 158 |
</table>
|
| 159 |
|
| 160 |
|
| 161 |
+
## Best Model List
|
| 162 |
|
| 163 |
### Full model set (cuML-enabled)
|
| 164 |
| Property | Best Model (Sequence) | Best Model (SMILES) | Task Type | Threshold (Sequence) | Threshold (SMILES) |
|
|
|
|
| 190 |
>Note: Models marked as SVM or ENET are replaced with XGB as these models are not currently supported in the deployment environment without cuML setups. *xgb_wt_log* indicated log-scaled transformation of time during training.
|
| 191 |
|
| 192 |
|
| 193 |
+
## Usage
|
| 194 |
|
| 195 |
### Local Application Hosting
|
| 196 |
- Host the [PeptiVerse UI](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse) locally with your own resources.
|
|
|
|
| 263 |
length: `int` (=L);
|
| 264 |
|
| 265 |
|
| 266 |
+
### Quick Inference By Property Per Model
|
| 267 |
```python
|
| 268 |
from inference import PeptiVersePredictor
|
| 269 |
|
|
|
|
| 352 |
|
| 353 |
```
|
| 354 |
|
| 355 |
+
## Interpretation
|
| 356 |
|
| 357 |
You can also find the same description in the paper or in the PeptiVerse app `Documentation` tab.
|
| 358 |
|
|
|
|
| 414 |
- A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
|
| 415 |
|
| 416 |
|
| 417 |
+
## Model Architecture
|
| 418 |
|
| 419 |
- **Sequence Embeddings:** [ESM-2 650M model](https://huggingface.co/facebook/esm2_t33_650M_UR50D) / [PeptideCLM model](https://huggingface.co/aaronfeller/PeptideCLM-23M-all). Foundational embeddings are frozen.
|
| 420 |
- **XGBoost Model:** Gradient boosting on pooled embedding features for efficient, high-performance prediction.
|
|
|
|
| 423 |
- **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
|
| 424 |
- **Others:** SVM and Elastic Nets were trained with [RAPIDS cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
|
| 425 |
|
| 426 |
+
## Troubleshooting
|
| 427 |
|
| 428 |
### LFS Download Issues
|
| 429 |
|
|
|
|
| 438 |
### TODOs
|
| 439 |
Bug loading transformer half-life model now, will fix soon.
|
| 440 |
|
| 441 |
+
## Citation
|
| 442 |
|
| 443 |
If you find this repository helpful for your publications, please consider citing our paper:
|
| 444 |
|