Spaces:

ChatterjeeLab
/

PeptiVerse

Running

App Files Files Community

ynuozhang commited on Jan 4

Commit

063d2f7

1 Parent(s): 728610a

update link

Browse files

Files changed (3) hide show

best_models.txt +1 -1
description.md +13 -3
inference.py +14 -0

best_models.txt CHANGED Viewed

@@ -7,4 +7,4 @@ Toxicity, -, Transformer, Classifier, -, 0.3401,
 Binding_affinity, unpooled, unpooled, Regression, -, -,
 Permeability_PAMPA, -, CNN, Regression, -, -,
 Permeability_CACO2, -, SVR, Regression, -, -,
-Halflife, xgb_wt_log, xgb_smiles, Regression, -, -,

 Binding_affinity, unpooled, unpooled, Regression, -, -,
 Permeability_PAMPA, -, CNN, Regression, -, -,
 Permeability_CACO2, -, SVR, Regression, -, -,
+Halflife, transformer_wt_log, xgb_smiles, Regression, -, -,

description.md CHANGED Viewed

@@ -30,7 +30,7 @@
 | Binding Affinity | 1436 | 1597 |
-Our models are trained on curated datasets from multiple sources. For detailed cleaning up procedures please refer to our [paper]().
 #### 🩸 Hemolysis Dataset
 - **Primary Source:** [the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv3)](https://academic.oup.com/nar/article-abstract/49/D1/D288/5957160)
@@ -86,7 +86,7 @@ Higher scores indicate stronger non-fouling behavior, desirable for circulation
 - **CNN/Transformer Model:** One-dimensional convolutional/self-attention transformer networks operating on unpooled embeddings to capture local sequence patterns.
 - **Binding Model:** Transformer-based architecture with cross-attention between protein and peptide representations.
 - **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
-- **Others:** SVM and Elastic Nets were trained with [RAPID cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
 ### Model Training and Weight Hosting
 - More instructions can be found here at [PeptiVersse](https://huggingface.co/ChatterjeeLab/PeptiVerse)
@@ -111,7 +111,17 @@ Higher scores indicate stronger non-fouling behavior, desirable for circulation
 If you use this tool, please cite:
 ```
-place holder
 ```
 ### Contact

 | Binding Affinity | 1436 | 1597 |
+Our models are trained on curated datasets from multiple sources. For detailed cleaning up procedures please refer to our [paper](https://www.biorxiv.org/content/10.64898/2025.12.31.697180v1).
 #### 🩸 Hemolysis Dataset
 - **Primary Source:** [the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv3)](https://academic.oup.com/nar/article-abstract/49/D1/D288/5957160)
 - **CNN/Transformer Model:** One-dimensional convolutional/self-attention transformer networks operating on unpooled embeddings to capture local sequence patterns.
 - **Binding Model:** Transformer-based architecture with cross-attention between protein and peptide representations.
 - **SVR Model:** Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
+- **Others:** SVM and Elastic Nets were trained with [RAPIDS cuML](https://github.com/rapidsai/cuml), which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
 ### Model Training and Weight Hosting
 - More instructions can be found here at [PeptiVersse](https://huggingface.co/ChatterjeeLab/PeptiVerse)
 If you use this tool, please cite:
 ```
+@article {Zhang2025.12.31.697180,
+	author = {Zhang, Yinuo and Tang, Sophia and Chen, Tong and Mahood, Elizabeth and Vincoff, Sophia and Chatterjee, Pranam},
+	title = {PeptiVerse: A Unified Platform for Therapeutic Peptide Property Prediction},
+	elocation-id = {2025.12.31.697180},
+	year = {2026},
+	doi = {10.64898/2025.12.31.697180},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2026/01/03/2025.12.31.697180},
+	eprint = {https://www.biorxiv.org/content/early/2026/01/03/2025.12.31.697180.full.pdf},
+	journal = {bioRxiv}
+}
 ```
 ### Contact

inference.py CHANGED Viewed

@@ -630,6 +630,16 @@ class WTEmbedder:
             self._cache_unpooled[s] = (X, M)
         return X, M
 # -----------------------------
@@ -744,6 +754,10 @@ class PeptiVersePredictor:
                         arch = "mlp"
                     elif arch.startswith("cnn"):
                         arch = "cnn"
                     self.models[(prop_key, mode)] = build_torch_model_from_ckpt(arch, obj, self.device)

             self._cache_unpooled[s] = (X, M)
         return X, M
+def _clean_state_dict(sd: dict) -> dict:
+    # just for wt halflife transformer predictor
+    out = {}
+    for k, v in sd.items():
+        if k.startswith("module."):
+            k = k[len("module."):]
+        if k.startswith("model."):
+            k = k[len("model."):]
+        out[k] = v
+    return out
 # -----------------------------
                         arch = "mlp"
                     elif arch.startswith("cnn"):
                         arch = "cnn"
+                    if prop_key == "halflife" and mode == "wt" and m == "transformer_wt_log":
+                        if isinstance(obj, dict) and "state_dict" in obj:
+                            obj = dict(obj)
+                            obj["state_dict"] = _clean_state_dict(obj["state_dict"])
                     self.models[(prop_key, mode)] = build_torch_model_from_ckpt(arch, obj, self.device)