Feature Extraction
PEFT
Safetensors
Transformers
proteins
molecules
bioinformatics
drug-discovery
lora
Instructions to use SaeedLab/BindScreen-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use SaeedLab/BindScreen-lora with PEFT:
Task type is invalid.
- Transformers
How to use SaeedLab/BindScreen-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="SaeedLab/BindScreen-lora")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("SaeedLab/BindScreen-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-nd-4.0
|
| 3 |
datasets:
|
| 4 |
-
- SaeedLab/
|
| 5 |
tags:
|
| 6 |
- proteins
|
| 7 |
- molecules
|
|
@@ -15,24 +15,24 @@ base_model: facebook/esm2_t36_3B_UR50D
|
|
| 15 |
library_name: peft
|
| 16 |
---
|
| 17 |
|
| 18 |
-
#
|
| 19 |
|
| 20 |
-
This repository contains the LoRA adapter weights for the protein encoder used in
|
| 21 |
|
| 22 |
-
The projection layers are available separately at [SaeedLab/
|
| 23 |
|
| 24 |
-
\[[Github Repo](https://github.com/pcdslab/
|
| 25 |
|
| 26 |
## Abstract
|
| 27 |
|
| 28 |
-
Virtual screening aims to identify candidate molecules that bind to a target protein, playing a central role in computational drug discovery. Sequence-based deep learning methods offer
|
| 29 |
|
| 30 |
## Model Details
|
| 31 |
|
| 32 |
-
This adapter corresponds to the **
|
| 33 |
|
| 34 |
-
- [
|
| 35 |
-
- [
|
| 36 |
|
| 37 |
| Field | Value |
|
| 38 |
|---|---|
|
|
@@ -46,7 +46,7 @@ This adapter corresponds to the **SeqScreen-Finetuning** configuration, in which
|
|
| 46 |
|
| 47 |
## Usage
|
| 48 |
|
| 49 |
-
This adapter must be used together with [SaeedLab/
|
| 50 |
|
| 51 |
### Dependencies
|
| 52 |
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-nd-4.0
|
| 3 |
datasets:
|
| 4 |
+
- SaeedLab/BindScreen
|
| 5 |
tags:
|
| 6 |
- proteins
|
| 7 |
- molecules
|
|
|
|
| 15 |
library_name: peft
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# BindScreen - ESM2 LoRA Adapter
|
| 19 |
|
| 20 |
+
This repository contains the LoRA adapter weights for the protein encoder used in BindScreen, trained on filtered ChEMBL. BindScreen is a sequence-based virtual screening method built on a dual-encoder contrastive architecture. The adapter fine-tunes [ESM2 T36](https://huggingface.co/facebook/esm2_t36_3B_UR50D) on protein-molecule interaction task.
|
| 21 |
|
| 22 |
+
The projection layers are available separately at [SaeedLab/BindScreen-Finetuning](https://huggingface.co/SaeedLab/BindScreen-Finetuning), which also contains the full model description, architecture diagram, and usage examples.
|
| 23 |
|
| 24 |
+
\[[Github Repo](https://github.com/pcdslab/BindScreen)\] | \[[Dataset on HuggingFace](https://huggingface.co/datasets/SaeedLab/BindScreen)\] | \[[Model Collection](https://huggingface.co/collections/SaeedLab/bindscreen)\] | \[[Cite](#citation)\]
|
| 25 |
|
| 26 |
## Abstract
|
| 27 |
|
| 28 |
+
Virtual screening aims to identify candidate molecules that bind to a target protein, playing a central role in computational drug discovery. Sequence-based deep learning methods offer a more broadly applicable alternative to structure-based approaches, since they do not require 3D structural information. However, they typically require a separate forward pass per protein-molecule pair, limiting their scalability to large molecular libraries. Contrastive learning methods inspired by CLIP address this by encoding proteins and molecules independently, allowing similarity analysis via simple comparisons rather than a forward pass per pair. However, standard CLIP training was designed for symmetric tasks and does not account for the asymmetric and one-to-many nature of protein-molecule binding. In this paper, we introduce *BindScreen*, a sequence-based virtual screening method built on a dual-encoder contrastive architecture. BindScreen introduces a protein-centric batch construction strategy and an asymmetric multi-positive InfoNCE loss to cope with the protein-centric nature of virtual screening. We conducted a systematic evaluation of 8 protein language models and 3 molecular language model variants against BindScreen. The proposed protein-centric batch construction consistently outperforms standard CLIP training across all evaluated encoders while substantially improving computational efficiency, reducing training cost by up to 32 times. In addition, our experiments demonstrate that BindScreen requires 7 times fewer inference computations than pairwise virtual screening approaches. On the LIT-PCBA dataset, BindScreen outperforms all sequence-based baselines, achieving a relative improvement of up to 39% in EF at 0.5 over the best competing method, while remaining competitive with traditional docking approaches without requiring 3D structural information.
|
| 29 |
|
| 30 |
## Model Details
|
| 31 |
|
| 32 |
+
This adapter corresponds to the **BindScreen-Finetuning** configuration, in which ESM2 T36 is fine-tuned via LoRA alongside the projection layers. Two configurations are available in this collection:
|
| 33 |
|
| 34 |
+
- [BindScreen-Frozen](https://huggingface.co/SaeedLab/BindScreen-Frozen): only the projection layers are trained, both encoders are frozen.
|
| 35 |
+
- [BindScreen-Finetuning](https://huggingface.co/SaeedLab/BindScreen-Finetuning): the projection layers and ESM2 T36 are trained via LoRA, MolDeBERTa MLC is frozen.
|
| 36 |
|
| 37 |
| Field | Value |
|
| 38 |
|---|---|
|
|
|
|
| 46 |
|
| 47 |
## Usage
|
| 48 |
|
| 49 |
+
This adapter must be used together with [SaeedLab/BindScreen-Finetuning](https://huggingface.co/SaeedLab/BindScreen-Finetuning), which provides the projection layers. The full usage example, including molecule encoding and similarity computation, is available in that repository.
|
| 50 |
|
| 51 |
### Dependencies
|
| 52 |
|