Commit
·
41076ac
1
Parent(s):
2a27b70
update readme
Browse files
README.md
CHANGED
|
@@ -12,51 +12,83 @@ tags:
|
|
| 12 |
# TITAN-BBB
|
| 13 |
The paper is under review.
|
| 14 |
|
| 15 |
-
\[[Github Repo](https://github.com/pcdslab/
|
| 16 |
|
| 17 |
## Abstract
|
| 18 |
-
The blood-brain barrier
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |

|
| 27 |
|
| 28 |
## Model Usage
|
| 29 |
|
| 30 |
-
Use the code below to predict a molecule's logBB value (blood-brain barrier permeability).
|
| 31 |
-
|
| 32 |
**Note**: The model is only available using ```AutoModelForSequenceClassification```.
|
| 33 |
|
| 34 |
-
**Note:** This model uses a custom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
```py
|
| 37 |
import torch
|
| 38 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/BBBP-Regression', trust_remote_code=True, device=device)
|
| 43 |
|
| 44 |
model.eval()
|
| 45 |
|
| 46 |
smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
|
| 47 |
inputs = tokenizer(smiles)
|
|
|
|
| 48 |
with torch.no_grad():
|
| 49 |
outputs = model(**inputs)
|
|
|
|
| 50 |
print(outputs.logits)
|
| 51 |
```
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
### Requirements
|
| 54 |
|
| 55 |
```
|
| 56 |
huggingface_hub
|
| 57 |
rdkit
|
| 58 |
torch
|
| 59 |
-
|
| 60 |
```
|
| 61 |
|
| 62 |
## Citation
|
|
|
|
| 12 |
# TITAN-BBB
|
| 13 |
The paper is under review.
|
| 14 |
|
| 15 |
+
\[[Github Repo](https://github.com/pcdslab/TITAN-BBB)\] | \[[Dataset on HuggingFace](https://huggingface.co/datasets/SaeedLab/BBB)\] | \[[Cite](#citation)\]
|
| 16 |
|
| 17 |
## Abstract
|
| 18 |
+
The blood-brain barrier (BBB) restricts most compounds from entering the brain, making BBB permeability prediction crucial for drug discovery. Experimental assays are costly and limited, motivating computational approaches. While machine learning has shown promise, combining chemical descriptors with deep learning embeddings remains underexplored. Here, we introduce TITAN-BBB, a multi-modal architecture that combines tabular, image, and text-based features via attention mechanism. To evaluate, we aggregated multiple literature sources to create the largest BBB permeability dataset to date, enabling robust training for both classification and regression tasks. Our results demonstrate that TITAN-BBB achieves 86.5% of balanced accuracy on classification tasks and 0.436 of mean absolute error for regression. Our approach also outperforms state-of-the-art models in both classification and regression performance, demonstrating the benefits of combining deep and domain-specific representations.
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
+
TITAN-BBB is a multi-modal method designed for molecular property (BBB) prediction. This architecture effectively combines three sources of information: embeddings from a pre-trained language model ([ChemBERTa-100M-MLM](https://huggingface.co/DeepChem/ChemBERTa-100M-MLM)), images representation extracted from convolutional neural networks ([ResNet50](https://docs.pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html)), and classical molecular descriptors ([RDKit](https://www.rdkit.org/)).
|
| 23 |
|
| 24 |
+
TITAN-BBB consists of three stages: multi-modal feature projection, attention-based fusion, and prediction.
|
| 25 |
|
| 26 |

|
| 27 |
|
| 28 |
## Model Usage
|
| 29 |
|
|
|
|
|
|
|
| 30 |
**Note**: The model is only available using ```AutoModelForSequenceClassification```.
|
| 31 |
|
| 32 |
+
**Note:** This model uses a custom architecture (Transformer + CNN + RDKit) defined in the source repository. Therefore, you must set `trust_remote_code=True` when loading both the model and the tokenizer.
|
| 33 |
+
|
| 34 |
+
### Classification
|
| 35 |
+
|
| 36 |
+
Use the code below to score (between 0 a 1) if a molecule can cross the BBB.
|
| 37 |
+
|
| 38 |
+
```py
|
| 39 |
+
import torch
|
| 40 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 41 |
+
|
| 42 |
+
model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)
|
| 43 |
+
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='classification', trust_remote_code=True)
|
| 44 |
+
|
| 45 |
+
model.eval()
|
| 46 |
+
|
| 47 |
+
smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
|
| 48 |
+
inputs = tokenizer(smiles)
|
| 49 |
+
|
| 50 |
+
with torch.no_grad():
|
| 51 |
+
outputs = model(**inputs)
|
| 52 |
+
|
| 53 |
+
print(torch.sigmoid(outputs.logits))
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
### Regression
|
| 57 |
+
|
| 58 |
+
Use the code below to predict a molecule's permeability value (blood-brain barrier permeability).
|
| 59 |
|
| 60 |
```py
|
| 61 |
import torch
|
| 62 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 63 |
|
| 64 |
+
model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)
|
| 65 |
+
tokenizer = AutoTokenizer.from_pretrained('SaeedLab/TITAN-BBB', subfolder='regression', trust_remote_code=True)
|
|
|
|
| 66 |
|
| 67 |
model.eval()
|
| 68 |
|
| 69 |
smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
|
| 70 |
inputs = tokenizer(smiles)
|
| 71 |
+
|
| 72 |
with torch.no_grad():
|
| 73 |
outputs = model(**inputs)
|
| 74 |
+
|
| 75 |
print(outputs.logits)
|
| 76 |
```
|
| 77 |
|
| 78 |
+
### Model Output
|
| 79 |
+
|
| 80 |
+
Both classification and regression models return for each input:
|
| 81 |
+
* logits: the raw output scores. For classification, please apply sigmoid to get the score between 0 and 1. For regression, use it as prediction.
|
| 82 |
+
* hidden_states: the attention-based aggregation of tabular, image, and text representations.
|
| 83 |
+
* attentions: the attention weights considering tabular, image, and text features for each input.
|
| 84 |
+
|
| 85 |
### Requirements
|
| 86 |
|
| 87 |
```
|
| 88 |
huggingface_hub
|
| 89 |
rdkit
|
| 90 |
torch
|
| 91 |
+
torchvision
|
| 92 |
```
|
| 93 |
|
| 94 |
## Citation
|