Joblib
sophtang commited on
Commit
ce6c777
Β·
verified Β·
1 Parent(s): d5f4408

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -4,9 +4,9 @@ license: cc-by-nc-nd-4.0
4
 
5
  ![Overview of PeptiVerse](peptiverse-cover.png)
6
 
7
- # PeptiVerse 🧬🌌
8
 
9
- A collection of machine learning predictors for canonical and non-canonical peptide property prediction using sequence and SMILES representations. 🧬 PeptiVerse 🌌 enables evaluation of key biophysical and therapeutic properties of peptides for property-optimized generation.
10
 
11
  ## Table of Contents
12
 
@@ -26,7 +26,7 @@ A collection of machine learning predictors for canonical and non-canonical pept
26
  - [Troubleshooting](#troubleshooting-)
27
  - [Citation](#citation-)
28
 
29
- ## Quick start 🌟
30
 
31
  ```bash
32
  # Clone repository
@@ -38,13 +38,13 @@ pip install -r requirements.txt
38
  # Run inference
39
  python inference.py
40
  ```
41
- ## Installation 🌟
42
- ### Minimal Setup πŸš€
43
  - Easy start-up environment (using transformers, xgboost models)
44
  ```bash
45
  pip install -r requirements.txt
46
  ```
47
- ### Full Setup πŸš€
48
  - Additional access to trained SVM and ElastNet models requires installation of `RAPIDS cuML`, with instructions available from their official [github page](https://github.com/rapidsai/cuml) (**CUDA-capable GPU required**).
49
  - Optional: pre-compiled Singularity/Apptainer environment (7.52G) is available at [Google drive](https://drive.google.com/file/d/1RJQ9HK0_gsPOhRo5H5ZmH_MYcpJqQD7e/view?usp=sharing) with everything you need (still need CUDA/GPU to load cuML models).
50
  ```
@@ -54,7 +54,7 @@ pip install -r requirements.txt
54
  # run inference (see below)
55
  apptainer exec peptiverse.sif python inference.py
56
  ```
57
- ## Repository structure 🌟
58
  This repo contains important large files for [PeptiVerse](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse), an interactive app for peptide property prediction. [Paper link.](https://www.biorxiv.org/content/10.64898/2025.12.31.697180v1)
59
 
60
  ```
@@ -75,7 +75,7 @@ PeptiVerse/
75
  └── requirements.txt # Python dependencies
76
  ```
77
 
78
- ## Training Data Collection 🌟
79
 
80
  <table>
81
  <caption><strong>Data distribution.</strong> Classification tasks report counts for class 0/1; regression tasks report total sample size (N).</caption>
@@ -158,7 +158,7 @@ PeptiVerse/
158
  </table>
159
 
160
 
161
- ## Best Model List 🌟
162
 
163
  ### Full model set (cuML-enabled)
164
  | Property | Best Model (Sequence) | Best Model (SMILES) | Task Type | Threshold (Sequence) | Threshold (SMILES) |
@@ -190,7 +190,7 @@ PeptiVerse/
190
  >Note: Models marked as SVM or ENET are replaced with XGB as these models are not currently supported in the deployment environment without cuML setups. *xgb_wt_log* indicated log-scaled transformation of time during training.
191
 
192
 
193
- ## Usage 🌟
194
 
195
  ### Local Application Hosting
196
  - Host the [PeptiVerse UI](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse) locally with your own resources.
@@ -263,7 +263,7 @@ print("Downloaded to:", local_dir)
263
  length: `int` (=L);
264
 
265
 
266
- ### Quick inference by property per model
267
  ```python
268
  from inference import PeptiVersePredictor
269
 
@@ -352,7 +352,7 @@ print(out)
352
 
353
  ```
354
 
355
- ## Interpretation 🌟
356
 
357
  You can also find the same description in the paper or in the PeptiVerse app `Documentation` tab.
358
 
@@ -414,7 +414,7 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
414
  - A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
415
 
416
 
417
- ## Model Architecture 🌟
418
 
419
  - **Sequence Embeddings:** [ESM-2 650M model](https://huggingface.co/facebook/esm2_t33_650M_UR50D) / [PeptideCLM model](https://huggingface.co/aaronfeller/PeptideCLM-23M-all). Foundational embeddings are frozen.
420
  - **XGBoost Model:** Gradient boosting on pooled embedding features for efficient, high-performance prediction.
@@ -423,7 +423,7 @@ Predicts peptide-protein binding affinity. Requires both peptide and target prot
423
  - **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
424
  - **Others:** SVM and Elastic Nets were trained with [RAPIDS cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
425
 
426
- ## Troubleshooting 🌟
427
 
428
  ### LFS Download Issues
429
 
@@ -438,7 +438,7 @@ huggingface-cli download ChatterjeeLab/PeptiVerse \
438
  ### TODOs
439
  Bug loading transformer half-life model now, will fix soon.
440
 
441
- ## Citation 🌟
442
 
443
  If you find this repository helpful for your publications, please consider citing our paper:
444
 
 
4
 
5
  ![Overview of PeptiVerse](peptiverse-cover.png)
6
 
7
+ # PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction 🧬🌌
8
 
9
+ This is the repository for [PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction](https://www.biorxiv.org/content/10.64898/2025.12.31.697180), a collection of machine learning predictors for canonical and non-canonical peptide property prediction using sequence and SMILES representations. 🧬 PeptiVerse 🌌 enables evaluation of key biophysical and therapeutic properties of peptides for property-optimized generation.
10
 
11
  ## Table of Contents
12
 
 
26
  - [Troubleshooting](#troubleshooting-)
27
  - [Citation](#citation-)
28
 
29
+ ## Quick Start
30
 
31
  ```bash
32
  # Clone repository
 
38
  # Run inference
39
  python inference.py
40
  ```
41
+ ## Installation
42
+ ### Minimal Setup
43
  - Easy start-up environment (using transformers, xgboost models)
44
  ```bash
45
  pip install -r requirements.txt
46
  ```
47
+ ### Full Setup
48
  - Additional access to trained SVM and ElastNet models requires installation of `RAPIDS cuML`, with instructions available from their official [github page](https://github.com/rapidsai/cuml) (**CUDA-capable GPU required**).
49
  - Optional: pre-compiled Singularity/Apptainer environment (7.52G) is available at [Google drive](https://drive.google.com/file/d/1RJQ9HK0_gsPOhRo5H5ZmH_MYcpJqQD7e/view?usp=sharing) with everything you need (still need CUDA/GPU to load cuML models).
50
  ```
 
54
  # run inference (see below)
55
  apptainer exec peptiverse.sif python inference.py
56
  ```
57
+ ## Repository Structure
58
  This repo contains important large files for [PeptiVerse](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse), an interactive app for peptide property prediction. [Paper link.](https://www.biorxiv.org/content/10.64898/2025.12.31.697180v1)
59
 
60
  ```
 
75
  └── requirements.txt # Python dependencies
76
  ```
77
 
78
+ ## Training Data Collection
79
 
80
  <table>
81
  <caption><strong>Data distribution.</strong> Classification tasks report counts for class 0/1; regression tasks report total sample size (N).</caption>
 
158
  </table>
159
 
160
 
161
+ ## Best Model List
162
 
163
  ### Full model set (cuML-enabled)
164
  | Property | Best Model (Sequence) | Best Model (SMILES) | Task Type | Threshold (Sequence) | Threshold (SMILES) |
 
190
  >Note: Models marked as SVM or ENET are replaced with XGB as these models are not currently supported in the deployment environment without cuML setups. *xgb_wt_log* indicated log-scaled transformation of time during training.
191
 
192
 
193
+ ## Usage
194
 
195
  ### Local Application Hosting
196
  - Host the [PeptiVerse UI](https://huggingface.co/spaces/ChatterjeeLab/PeptiVerse) locally with your own resources.
 
263
  length: `int` (=L);
264
 
265
 
266
+ ### Quick Inference By Property Per Model
267
  ```python
268
  from inference import PeptiVersePredictor
269
 
 
352
 
353
  ```
354
 
355
+ ## Interpretation
356
 
357
  You can also find the same description in the paper or in the PeptiVerse app `Documentation` tab.
358
 
 
414
  - A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.<br>
415
 
416
 
417
+ ## Model Architecture
418
 
419
  - **Sequence Embeddings:** [ESM-2 650M model](https://huggingface.co/facebook/esm2_t33_650M_UR50D) / [PeptideCLM model](https://huggingface.co/aaronfeller/PeptideCLM-23M-all). Foundational embeddings are frozen.
420
  - **XGBoost Model:** Gradient boosting on pooled embedding features for efficient, high-performance prediction.
 
423
  - **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
424
  - **Others:** SVM and Elastic Nets were trained with [RAPIDS cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
425
 
426
+ ## Troubleshooting
427
 
428
  ### LFS Download Issues
429
 
 
438
  ### TODOs
439
  Bug loading transformer half-life model now, will fix soon.
440
 
441
+ ## Citation
442
 
443
  If you find this repository helpful for your publications, please consider citing our paper:
444