gabrielbianchin commited on
Commit
2a27b70
·
1 Parent(s): 923f2c5
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-nd-4.0
3
+ datasets:
4
+ - SaeedLab/BBB
5
+ tags:
6
+ - chemistry
7
+ - bioinformatics
8
+ - drug-discovery
9
+ - blood-brain-barrier
10
+ ---
11
+
12
+ # TITAN-BBB
13
+ The paper is under review.
14
+
15
+ \[[Github Repo](https://github.com/pcdslab/BBBP-Hybrid)\] | \[[Classification Model](https://huggingface.co/SaeedLab/BBBP-Classification)\] | \[[Dataset on HuggingFace](https://huggingface.co/datasets/SaeedLab/BBBP)\] | \[[Cite](#citation)\]
16
+
17
+ ## Abstract
18
+ The blood-brain barrier is a critical interface of the central nervous system, preventing most compounds from entering the brain. Predicting BBB permeability is essential for drug discovery targeting neurological diseases. Experimental in vitro and in vivo assays are costly and limited, motivating the use of computational approaches. While machine learning has shown promising results, combining handcrafted chemical descriptors with deep learning embeddings remains underexplored. In this work, we propose a model that integrates atom-level embeddings derived from SMILES representations with descriptors from cheminformatics libraries. We also introduce a curated dataset aggregated from multiple literature sources, which, to the best of our knowledge, is the largest available for this task. Results demonstrate that our approach outperforms state-of-the-art methods in classification and achieves competitive performance in regression, highlighting the benefits of combining deep representations with domain-specific features.
19
+
20
+ ## Model Details
21
+
22
+ This model is a hybrid deep learning method designed for molecular property classification. This architecture effectively combines three sources of information: embeddings from a pre-trained language model ([ChemBERTa-100M-MLM](https://huggingface.co/DeepChem/ChemBERTa-100M-MLM)), GNN, and classical molecular descriptors ([RDKit](https://www.rdkit.org/)).
23
+
24
+ The model pipeline consists of four stages, embedding extraction, message passing, feature aggregation, and prediction. The model was trained using l1 loss and the AdamW optimizer.
25
+
26
+ ![Model](pipeline.jpg)
27
+
28
+ ## Model Usage
29
+
30
+ Use the code below to predict a molecule's logBB value (blood-brain barrier permeability).
31
+
32
+ **Note**: The model is only available using ```AutoModelForSequenceClassification```.
33
+
34
+ **Note:** This model uses a custom hybrid architecture (Transformer + GNN + RDKit) defined in the source repository. Therefore, you must set `trust_remote_code=True` when loading both the model and the tokenizer.
35
+
36
+ ```py
37
+ import torch
38
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
39
+
40
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
41
+ model = AutoModelForSequenceClassification.from_pretrained('SaeedLab/BBBP-Regression', trust_remote_code=True).to(device)
42
+ tokenizer = AutoTokenizer.from_pretrained('SaeedLab/BBBP-Regression', trust_remote_code=True, device=device)
43
+
44
+ model.eval()
45
+
46
+ smiles = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
47
+ inputs = tokenizer(smiles)
48
+ with torch.no_grad():
49
+ outputs = model(**inputs)
50
+ print(outputs.logits)
51
+ ```
52
+
53
+ ### Requirements
54
+
55
+ ```
56
+ huggingface_hub
57
+ rdkit
58
+ torch
59
+ torch_geometric
60
+ ```
61
+
62
+ ## Citation
63
+
64
+ The paper is under review. As soon as it is accepted, we will update this section.
65
+
66
+ ## License
67
+
68
+ This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of this model and its derivatives, which include models trained on outputs from the model or datasets created from the model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.
69
+
70
+ ## Contact
71
+
72
+ For any additional questions or comments, contact Fahad Saeed (fsaeed@fiu.edu).